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FOREWORD 


The Software Engineering Laboratory (SEL) is an organization 
sponsored by the National Aeronautics and Space Administra- 
tion Goddard Space Flight Center (NASA/GSFC) and created for 
the purpose of investigating the effectiveness of software 
engineering technologies when applied to the development of 
applications software. The SEL was created in 1977 and has 
three primary organizational members: 

NASA/GSFC (Systems Development and Analysis Branch) 

The University of Maryland (Computer Sciences Department) 
Computer Sciences Corporation (Flight Systems Operation) 

The goals of the SEL are (1) to understand the software de- 
velopment process in the GSFC environment? (2) to measure 
the effect of various methodologies, tools, and models on 
this process; and (3) to identify and then to apply success- 
ful development practices. The activities, findings, and 
recommendations of the SEL are recorded in the Software En- 
gineering Laboratory Series, a continuing series of reports 
that includes this document. 

Single copies of this document can be obtained by writing to 
Frank E. McGarry 

Code 582.1 ' 

NASA/GSFC 

Greenbelt, Maryland 20771 
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SIXTH ANNUAL SOFTWARE ENGINEERING WORKSHOP 


ABOUT THE WORKSHOP 


The Sixth Annual Software Engineering Workshop was held on December 2, 1981, 
at Goddard Space Flight Center in Greenbelt, MD. Nearly 200 people, represent- 
ing 6 universities, 19 agencies of the federal government, and 30 private 
organizations, attended the meeting. 

As in the past 5 years, the major emphasis for this meeting was the reporting 
and discussion of experiences in the identification, utilization, and evaluation 
of software methodologies, models, and tools. Eleven speakers, making up four 
separate sessions, partici pated in the meeting with each session having a panel 
format with heavy participation from the audience. 

The workshop is organized by the Software Engineering Laboratory (SEL), whose 
members represent the NASA/GSFC, University of Maryland, and Computer Sciences 
Corporation (CSC). The meeting has been an annual event for the past 6 years 
(1976 to 1981), and there are plans to continue those yearly meetings as long 
as they are productive. 

The record of the meeting is generated by members of the SEL and is printed and 
distributed by the Goddard Space Flight Center. All persons who are registered 
on the mail list of the SEL receive copies of the proceedings at no charge. 

Additional information about the workshop or about the SEL may be obtained by 
contacting: 


Mr. Frank McGarry 
Code 582.1 
NASA/GSFC 

Greenbelt, MD 20771 
301-344-5048 
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AGENDA 


8:45 a.m. 
9:00 a.m. 


10:30 a.m. 
10:45 a.m. 


12:45 p.m. 
1:15 p.m. 


SIXTH ANNUAL SOFTWARE ENGINEERING WORKSHOP 
NASA/GODDARD SPACE FLIGHT CENTER 
BUILDING 3 AUDITORIUM 
DECEMBER 2, 1981 


INTRODUCTORY REMARKS F. E. McGarry/GSFC 

MORNING CHAIRMAN F. E. McGarry 


SESSION NO. 1 


“Evaluating Software Development 
Characteristics” 


D. Weiss (NRL) 

J. Page (CSC) 

V. Basili 

(University of MD) 
BREAK 

SESSION NO. 2 


“Analyzing Error Characteristics in Software 
Development” 

“Evaluating the Effects of an Independent 
Verification and Validation Team” 

“Assessment of Software Measures in the 
Software Engineering Laboratory” 


“Software Metrics” 


J. Gaffney/R. Judge (IBM) 


J. Post 

(Boeing Aerospace) 

D. Card (CSC) 

LUNCH 

AFTERNOON CHAIRMAN 

SESSION NO. 3 

B. Littlewood/A. Sofer 
(GW University) 


“The Quantitative Impact of Four Factors on 
Work Rates Experienced During Software 
Development” 

“Software Quality Metrics for Distributed 
Systems” 

“Identification and Evaluation of Software 
Metrics” 


V. Basili 

“Software Models” 

“A Bayesian Approach to Parameter 
Estimation in the Jelinski-Moranda Software 
ReliabUity Model” 


vii 




H. Sayani/C. Svoboda 
(ASTEC) 

‘The Problem of Resonance in Technology 
Usage” 

2:45 p.m. 

BREAK 


3:00 p.m. 

SESSION NO. 4 

“Software Methodologies” 


H. Mills/M. Dyer (IBM) 

“A Methodology for Improving Software 
Reliability” 


B. Jones (Hughes) 

“Selecting a Software Development 
Methodology” 


R. Hamilton 
(Bell Labs) 

“Development Techniques for Generic 
Software” 

5:00 p.m. 

ADJOURN 



viii 



Workshop Introduction 


The software engineering workshop is one attempt to promote the interchange of 
ideas, experiences and approaches to the measurement and evaluation of varying 
techniques used in the software development process. The first meeting was 
held in August of 1976 in partial response to NASA's concern for the apparent 
gap between the availability of state-of-the-art software development approaches 
and the actual utilization of these techniques. Also, the First International 
Conference on Software Engineering had been held in Washington, DC the previous 
year and had stimulated interest and concern within the NASA community. 

The first workshop at Goddard essentially surveyed some available state-of-the- 
art development techniques to determine if they would be applicable in the NASA 
environment. The meeting was attended by approximately 25 people. As a result 
of this first workshop, NASA/GSFC initiated efforts to investigate the effective- 
ness of the numerous available approaches to developing software. 

Within a few months after the first workshop, an organization was created 
(called the Software Engineering Laboratory--SEL) which was chartered to 
measure the impact that various methodologies, tools, and models had on appli- 
cations software within NASA/GSFC. The SEL was formed as a partnership between 
NASA/GSFC, the University of Maryland, and Computer Sciences Corporation (CSC). 
During the first year of operation, the SEL concerned itself with the approaches 
to conducting software development experiments and to collecting development 
data for study. The SEL became very interested in finding others who were 
attempting to do similar things. 

The Second Software Engineering Workshop was held in September 1977 at NASA/ 

GSFC with the central theme being 'Who else is performing software experiments 
and collecting software data'. Approximately 55 persons attended this meeting 
and many approaches and experiences relating to software experiments and data 
collection were discussed--both during presentations and during informal 
discussions. 

The third meeting was held in September of 1978 at NASA/GSFC. Continued 
emphasis was placed on the data collection and software experiments. Many of 
the discussions focused on the question of 'how' do. you collect software data 
and how do you successfully conduct software experiments. This meeting was 
attended by approximately 70 people. 

The fourth and fifth meetings again were held at NASA/GSFC in November of 1979 
and November of 1980 respectively. During these sessions, the emphasis was 
once again placed on data collection and the actual experiences with software 
methodologies, models, tools, and measures. 

The sixth meeting is another attempt to listen to experiences that people have 
had in attempting to apply various modern programming practices. Although the 
workshops occasionally seem to stray away from the central theme of data 
collections and software experiments, the major objectives are still essentially 
being met. As an example, these workshops have been instrumental in providing 
suggestions and guidance to the efforts within the SEL at Goddard. The SEL has 
now been in existence for about 6 years and has-closely monitored 34 applications 
projects with NASA/GSFC, collecting approximately 15 m bytes of development data. 
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This data has continually been studied and evaluated and has led to numerous 
measurements and evaluations of software methodology models and tools. 

Many effective relationships were initiated through the workshops and a great 
number of experiences, experimental results and data itself has been exchanged 
between organizations. The Sixth Workshop will attempt to stimulate further 
exchanges . 
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WORKSHOP BACKGROUND 

1ST (AUGUST 1976) STIMULATED BY 

• 1ST INTERNATIONAL CONFERENCE ON S.E. (1975) 

• NASA CONCERN FOR SOFTWARE TECHNOLOGY 

• APPARENT LACK OF TECHNOLOGY UTILIZATION 


CREATION OF SOFTWARE ENGINEERING LABORATORY (SEL) 


^ 

2ND (SEPT. 1977) • WHO IS COLLECTING SOFTWARE DATA 

• WHO IS EXPERIMENTING WITH TECHNOLOGY 
3RD (SEPT. 1978) • HOW DO YOU VALIDATE DEVELOPMENT DATA 


• HOW DO YOU INTERPRET THE DATA 

• SOME RESULTS OF EXPERIMENTS 
SOFTWARE MODELS 
SOFTWARE METRICS 

• FURTHER EXPERIMENTS, DATA COLLECTION 
& PROPOSED EXPERIMENTS 


4TH (NOV. 1979) 


5TH (NOV. 1980) 
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XV 


(NASA/GSFC) 


SOFTWARE ENGINEERING LABORATORY 


• CREATED FALL 1976 (NASA/GSFC-UNIV. MD.) 

• WHY 

• PROFILE OF CURRENT DEVELOPMENT TECHNIQUES 

• EVALUATE EFFECTIVENESS OF MPP 

• APPLY IMPROVED METHODS TO SOFTWARE AT GSFC 

• HOWTO PROCEED 

• EXTRACT DETAILED DATA FROM ACTIVETASKS (FORMS/DATA 

' COLLECTION/VALIDITY ) 

• GENERATE CONTROL EXPERIMENTS (EXPERIMENTAL DESIGN/ 

STATISTICAL ANALYSIS) 

• QUALIFY THE'GOOb' SOFTWARE AND'BAD' (MODELS/MEASURES/ 

METRICS) 
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TTAX 


MEASURING SOFTWARE IN THE SEL 

BASIS FOR ANALYSIS 

i 


• LABORATORY EXPERIMENTS 


34 PROJECTS 


• INFORMATION MONITORED 


1.6 million L.O.C. 


• PROGRAMMERS/MANAGERS REPRESENTED... 115 PEOPLE 


• DATA EXTRACTED 

FORMS 

TOOLS 

SUBJECTIVE 

I 

• METHODOLOGIES APPLIED 


40 m BYTES ON DATA BASE 
(15,000 FORMS) 


200 QUALIFYING PARAMETERS 
VARIOUS MODELS, 
TOOLS 


NASA/SEL 
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SOFTWARE ENGINEERING LABORATORY 
CURRENT ACTIVITIES (FY 81) 

I 

PROJECTS BEING MONITORED APPROACHES UNDER STUDY 


APPROXIMATE • INDEPENDENT VERIFICATION & INTEGRATION 


NAME 

END OF DATE 

SIZE (L.O.C.) 

• CONFIGURATION MANAGEMENTTOOL 

DE-A (ADS) 

6/81 

? 68,000 

• REQUIREMENTS LANGUAGE (MEDL-R) 

DE-B 

6/81 

, 65,000 

• INFORMATION HIDING 

DADS 

5/81 

16,000 

• DATA ABSTRACTION 

ADDS 

10/81 

! 18,000 

• STRUCTURED ANALYSIS (YOURDON & DEMARCO) 

RADMAS 

6/82 

1 50,000 

• n" CHARTS FOR DESIGN 

AADS 

9/82 

' 15,000 


DECAP 

6/81 

^ 12,000 


GEDAP 

7/81 

4,000 



ANALYSIS ACTIVITIES: , 

• RELIABILITY MODEL EVALUATION (MUSA, GOEL ) 

• APPROACHES TO SOFTWARE TESTING 

• MEASURES - METRICS FOR SOFTWARE (MCCA BE, HALSTEAD, MCCALL ) 

• TOOLS EVALUATION (PWB, MEDL-R, CAT, SAP,. . .) 

• METHODOLOGY EVALUATIC)N 


I 
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TOPICS FOR 6TH WORKSHOP 


ANALYZING ERROR CHARACTERISTICS 

i 

MEASURES FOR SOFTWARE 

MODELS' FOR RELIABILITY & DEVELOPMENT 


METHODOLOGIES FOR DEVELOPMENT 



SUMMARY OF THE SESSIONS: 

SIXTH ANNUAL SOFTWARE ENGINEERING WORKSHOP 


Suellen Eslinger 
COMPUTER SCIENCES CORPORATION • 

and 

THE GODDARD SPACE FLIGHT CENTER 
SOFTWARE ENGINEERING LABORATORY 


Prepared for the 
NASA/GSFC 


Sixth Annual Software Engineering Workshop 



SESSION 1 - EVALUATING SOFTWARE PEVELOPMEm’ 
CHARACTERISTICS 


Dave Weiss - "Analyzing Error Characteristics in Software De 
velopment" 

Tne first speaker of the first session was Dave Weiss from 
the Naval Research Laboratory (NRL) . The purpose of his 
presentation was to characterize software changes in two 
different software development environments. Changes re- 
quired to correct errors formed one subcategory of the soft- 
ware changes studied. Data was used from several projects 
at GSFC and at NRL; data for the GSFC projects was collected 
by the Software Engineering Laboratory (SEL) . 

Although the two environments were quite different, the 
characteristics of the software changes were found to be 
very similar. For example, in both environments relatively 
few errors (approximately 5 percent) took more than 1 day to 
correct, and relatively few errors (approximately 2 to 
5 percent) were caused by. requirements problems. Although 
the error characteristics detected may not be applicable to 
other environments, the same type of study could be per- 
formed by another group of software developers to charac- 
terize errors in their environment. The results of this 
type of study can help^determine where effort should -be 
focused to reduce errors and thus improve reliability in 
software being developed in a given environment. 

In response to questions from the audience, Weiss clarified 
several points; 

• Interface errors were only a small part of the 
errors counted that affected more than one module. Unlike 
Similar studies in the literature, relatively few errors in 
the two environments were found to be interface errors. 
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• All projects studied were completed, but no data 
was used from the maintenance phase of the projects. 

• Changes were tracked from the time that a module 
was entered into the library. In both environments this 
process took place after the programmer had coded, compiled, 
ana tested the module, i.e., at the completion of unit 
testing . 

• Neither environment had a formal configuration con- 
trol board. The programmer was responsible for determining 
the correctness of the change, and the effort to fix an 
error was accepted to be the amount of time the programmer 
said it took to make and test the chapge. 

• The NRL environment had even less configuration 
control than the GSFC environment. Configuration control in 
the NRL project consisted of project leaders alone perform- 
ing library upaates. 

Jerry Page - "Evaluating the Effects of an Independent Veri- 
fication and Validation Team" 

The next speaker of the session was Jerry Page from Computer 
Sciences Corporation (CSC) . The purpose of his presentation 
was to evaluate the effectiveness of a particular methodol- 
ogy when utilized in the development of application soft- 
ware. Experiments in applying independent verification and 
integration (V&I) were conducted at GSFC during the develop- 
ment of two ground-based software projects. CSC was re- 
sponsible for the V&I effort under contract to GSFC. 

Detailed data for the projects was collected by the SEL. 

The two V&I projects were compared to two similar earlier 
projects monitored by the SEL for which V&I had not been 
used. Seven specific measures were used to weigh the 
effects of applying the methoaology. The only clearly 
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favorable effect found was a reduction in the number of re- 
quirements errors. Furthermore, the V&I experimental proj- 
ects were costly, and the resulting software seemed to be as 
error prone as the software produced by the projects for 
which V&I was not used. However, the speaker noted that as 
more experience is gained with a particular methodology, 
better results are usually achieved. Thus, Page indicated, 
more experimentation with V&I is warranted, especially with 
projects of a larger size (10 to 12 staff-years) and/or with 
high reliability requirements. 

This presentation generated a large response from workshop 
participants. The following points were clarified by Page 
in answer to questions from the audience: 

• The V&I teams represented approximately 15 to 

18 percent of the development effort in size and were simi- 
lar to the development teams in experience. 

• In general, the V&I teams worked behind the devel- 
opment teams, verifying the completed code while new code 
was being developed. 

• The activity of code reading was performed by the 
development teams as a standard practice. Since the V&I 
teams were relatively small compared to the amount of code 
produced, the V&I- teams emphasized testing of the software 
and not code reading. In fact, testing was found to be the 
most cost-effective part of the V&I effort. 

• No investigation was made of the effect of the V&I 
teams on the readability or the maintainability of the 
code. Since the V&I teams were not directly involved in the 
code reading activity, their presence was not expected to 
affect the quality of the code in readability or maintain- 
ability. 


S. Esiinget 
CSC 
3 of 21 



• In the four projects studied, similar methodologies 
were used, except for the presence of the V&I teams. 

• In all four projects, acceptance testing was per- 
formed by an independent team, whose effort did not overlap 
the effort of either the development teams or the V&I 
teams. In particular, the V&I teams did not verify the 
acceptance tests. Thus, the quality of acceptance tests was 
not perceived to differ significantly for the four projects. 

• Most errors found during acceptance testing were 
not due, in general, to testing with real data. Since real 
data is not usually obtained until very late in acceptance 
testing, most testing is performed with simulated data. 

• A member of the audience suggested that the value 
of tne V&I efforts may appear after acceptance testing. 

Page responded that in this environment, on the average, 
only 15 percent of the total cost is incurred during the 
maintenance phase. Thus, a significant savings in cost is 
not expected for the V&I projects during this phase. How- 
ever, all of the projects studied are still being monitored, 
and the data will continue to be analyzed. 

• There were some instances in which the development 
teams relied upon the V&I teams to find their errors. 

• There was also an overlap in errors found by the 
development -teams and the V&I teams .although the percentages 
have not been computed, 

• CSC's Milt Phenneger, wno participated in the V&I 
effort, suggested that the V&I process could be improved by 
tailoring the design and scheduling of the software releases 
to an independent testing effort. However, the speaker 
noted that the purpose of the experiment was to assess the 
effect of independent V&I without perturbing the existing 
software development process. 
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Vic Basili 


"Assessment of Software Measures in the Software 


Engineering Laboratory" 

The last speaker of the session was Vic Basili from the 
University of Maryland. This presentation concentrated on 
software measures as studied in the SEL. He outlined the 
characteristics of measures examined by the SEL during the 
past 4 years. His discussion focused on various classes of 
measures, such as subjective and objective measures of the 
software process and product, cost, and quality. He dis- 
cussed the use of metrics for categorization, evaluation, 
and prediction. One result obtained from the analysis of 
SEL data is that many of the complexity measures, including 
the Halstead measures, are highly correlated with each other 
and with the number of lines of code. This is a disappoint- 
ing result because it indicates that in this environment 
none of the more sophisticated complexity measures is a 
better predictor than the simple measure of lines of code. 

A cost model has been developed using subjective metrics to 
modify the basic size/effort equation. Other results indi- 
cate that in this environment productivity correlates posi- 
tively with methodology but with few other factors, 
including size. Also, subjective measures of quality cor- 
relate positively with methodology- and inversely with com^ 

plexity . 

In response to questions from the audience, Basili clarified 
the following points: 

• Examples were given of the subjective measures of 
quality, of the methodology measures, and of the complexity 
measures for which data is being collected by the SEL. 

• On a typical project studied by the SEL, methodol- 
ogies either tend to be used as a total group or completely 
avoided. As methodology is used to a larger extent, the 
quality and productivity tend to increase. However, the 

S. Eslinger 
CSC 
5 of 21 



measures dealing with the degrees of use of a particular 
methodology do not function individually as predictors. 
Rather, the overall set of methodology measures should be 
used. 


S. Estinger 
CSC 
6 of 21 



SESSION 2 - SOFTWARE METRICS 


Bob Judge - "The Quantitative Impact of Four Factors on Work 
Rates Experienced During Software Development" 

The first speaker of the second session was Bob Judge from 
the International Business Machines Corporation (IBM) , who 
presented the results of a study done jointly with John 
Gaffney. The purpose of the study was to attempt to use 
parameters (or factors) to explain the effort required for 
developing software with the end goal of building a cost 
estimation model. 

The effects of four factors on work rate were measured for 
nine components of the software development life cycle. The 
four general factors studied were the personnel type (pro- 
grammers versus systems engineers) , the product (type of 
software application) , the computer (one of three host com- 
puters) , and the code type (new versus modified software) . 
Data was used from projects developed within IBM. The esti- 
mation process was more effective for some components of the 
life cycle than for others. The four factors provided the 
best estimates of work rate for the components dealing with 
implementation and the worst estimates of work rate for the 
requirements analysis phase. Overall, 39 percent of the 
variation in work rate for the projects studied was ex- 
plained. 

In response to questions from the audience. Judge clarified 
the following points: 

• The study was based on historical data for com- 
pleted projects. 

• The number of samples used for the analysis was the 
number of projects studied. However, not every project 
necessarily covered all nine components of the software life 
cycle. 

S. Eslingei 
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• The cost data used came directly from customer 
charges and was, therefore, considered highly accurate. 
Inaccuracies, however, could be present in the distribution 
of costs among the nine life cycle components. Dimensions 
of cost were expressed in staff-months instead of dollars to 
eliminate the effects of inflation. The size data used 
could contain some inaccuracies but, on the whole, it was 
felt to be fairly accurate. 

• The purpose of the study was to obtain a predictive 
model for cost estimation. 

Jonathan Post - "Software Quality Metrics for Distributed 

Systems" 

The second speaker for the session was Jonathan Post from 
Boeing Aerospace Corporation, who discussed measures for 
distributed processing systems. As part of a project to 
define and evaluate measures for distributed systems, per- 
sonnel investigated the similarities and differences between 
measures applicable to distributed systems and those appli- 
cable to single-processor systems. 

The starting point for the study was the set of factors or 
qualities desirable in a software system and the criteria 
for evaluating those factors as defined by J. McCall from 
the General Electric Company. Post added criteria appli- 
cable to distributed systems to some of McCall's factors, 
and he defined additional factors and associated criteria 
for distributed systems. The rationale for these additions 
was presented in some detail. Post indicated that during 
the next year data will be collected for distributed systems 
developed by Boeing Aerospace; it will then be analyzed in 
an attempt to evaluate the quality measures that have been 
def ined. 
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In response to questions from the audience, Post clarified 
the following points: 

• A definition of a distributed system is critical to 
the project to select projects for which data will be col- 
lected. Since no consensus currently exists in the com- 
munity for the exact definition of a distributed system, 
significant effort was expended on establishing what this 
project considered to be a distributed system. 

• The data will be collected using McCall's approach 
of a standard worksheet filled out by project personnel. 
Information will be extracted from these forms by a single 
person in an effort to eliminate the potential for bias in 
the responses. Interviews will also be held with project 
personnel to establish the validity of the data. Since Post 
is familiar with practices used in the projects being 
studied, he expected that his role in the company as a 
quality assurance monitor would help him obtain valid data. 

• The set of quality metrics established includes 
some system metrics and some software metrics. Some of the 
distributed system factors are the same as those established 
by McCall. Other factors have been modified (i.e., new cri- 
teria added to those given by McCall) , while still others 
are entirely new. 

Dave Card - "Identification and Evaluation of Software 
Metrics" 

The last speaker of the session was Dave Card of CSC. The 
purpose of his presentation was to describe a procedure for 
identifying the underlying qualities measured by a set of 
software measures. For a number of actual software proj- 
ects, values have been determined by the SEL for 200 meas- 
ures that cover the range of GSFC software development 
activities. 
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For tois study, data was used from 22 projects for 60 meas- 
ures describing the software development process and prod- 
uct. The product measures studied included size and 
resource measures, and the process measures were ratings of 
the degree of use of various methodologies, tools, and docu- 
mentation procedures. Six of these measures, for which 
there were insufficient examples of use in the data, were 
rejected by a test of normality. A factor analysis was per- 
formed on the remaining 54 measures that extracted 5 factors 
accounting for 77 percent of the variance of the original 
data. The factors can be thought of as the underlying inde- 
pendent qualities being measured by the 54 measures. The 
five factors represented methodology intensity, project 
size, computer usage, quality assurance, and change rate. 
Card emphasized that this procedure produces a descriptive 
model, not a predictive model, and that it is an interme- 
diate step toward further research. 

This presentation generated considerable audience interest. 
In response to questions. Card briefly described the factor 
analysis procedure and clarified the meanings of several 
factors. He also expanded upon the following points: 

• The factors themselves are not directly measur- 
able. The factor analysis procedure, however, computes the 
correlation of the original variables (i.e., measures) with 
each of the factors. The measures shown as contributing to 
each factor were those whose correlations with the factor 
were at the 0.01 level of significance. 

• Variance can be viewed as the amount of information 
contained in the data. Thus, the factor model produced ac- 
counted for 77 percent of the information in the 54 measures 
over the 22 projects. 

• The 200 measures for which data is collected by the 
SEL were originally selected as completely characterizing 
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the GSFC software development activity. The 60 measures 
used in this particular study consisted of all those related 
to the software development process or product. Of these, 

54 passed the test of normality and were used in the factor 
analysis . 

• The measures reflecting the degree of use of a 
particular methodology, tool, or documentation procedure are 
not binary variables but are ratings on a scale of 0 to 5. 
These ratings, reflecting the degree of use of each proce- 
dure, were assigned to each project by a single group of 
people. 

• The factor procedure does not produce a predictive 
model. It provides information different from the correla- 
tions among variables. For instance, although the produc- 
tivity measure was not significantly correlated with the 
methodology intensity factor, it can not be implied or in- 
ferred that productivity is independent of any specific 
methodology. In fact, the productivity measure may be 
highly correlated with the degree of use of an individual 
methodology. 

• The approach followed in this study is different 
from that generally followed. Usually, studies select de- 
sirable qualities and then seek measures of these quali- 
ties. Here, data from a number of measures is collected, 
and the qualities being measured by this data are then iden- 
tified. 

• Several people besides the speaker pointed out that 
these results reflect the environment being studied by the 
SEL and that they may not be applicable to other environ- 
ments. 
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SESSION 3 - SOFTWARE MODELS 


Ariela Sofer - "A Bayesian Approach to Parameter Estimation 

in the Jelinski-Moranda Software Reliability 
Model" 

The first speaker of the third session was Ariela Sofer from 
the George Washington University, who presented the results 
of work done jointly with Bev Littlewood. The purpose of 
the presentation was to evaluate the effectiveness of the 
Jelinski-Moranda software reliability model. 

Error data provided by John Musa from Bell Laboratories was 
used to perform the evaluation. Estimates produced by the 
Littlewood model from this data were shown to be better than 
similar estimates obtained from the Jelinski-Moranda model. 
Several shortcomings in the Jelinski-Moranda model were 
enumerated. In particular, the estimates obtained from this 
model were consistently too optimistic. A Bayesian reparam- 
eterization of the Jelinski-Moranda model was presented; and 
estimates produced by the standard and reparameterized ver- 
sions of the Jelinski-Moranda models for the error data were 
compared. This comparison showed that the reparameterized 
Jelinski-Moranda model produced better results than the 
standard version. 

In response to questions from the audience, Sofer clarified 
the following points: 

• In the error data used, the times between failure 
were calculated as the execution times between program fail- 
ure. John Musa, who collected the data, further explained 
that a program failure was considered to be any occasion on 
which the program did not perform according to its require- 
ments. 
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• The models being evaluated assume that the times 
between failures are independent. This may not be the case 
witn actual data. 

• The models assume that when a program failure 
occurs, the error is corrected before execution of the pro- 
gram continues. 

Disagreement on the approach presented in Sof er ' s talk was 
evidenced by comments from John Musa and Nozer Singpurwalla. 
Musa stated that it was unfortunate that Littlewood was not 
present at the workshop to participate. Certain other 
points were made as follows: 

• Musa stated that he had published a comparable re- 
parameterization of the Jelinski-Moranda model in 1975. 

• Both Musa and Singpurwalla pointed out that there 
are problems with using quantile-quantile (Q-Q) plots to 
evaluate the models. Q-Q plots are based on an assumed dis- 
tribution of the random variable being studied. Thus, they 
are sensitive to the choice of this distribution for which 
no clear criteria are available. 

• Furthermore, Singpurwalla noted that if a uniform 
prior distribution were assumed, the Bayesian model should 
have given the same result as the original Jelinski-Moranda 
model. The fact that it did not suggests an error in the 
calculations . 

• Musa said that the flaws in this approach to com- 
paring reliability models were pointed out to him by Amrit 
Goel. Musa relayed this information to Littlewood but has 
not yet received a response from him. 

Hasan Sayani - "The Problem of Resonance in Technology Usage" 

The second speaker of this session was Hasan Sayani from 
ASTEC Corporation, who presented the results of work done 
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jointly with Cyril Svoboda. His presentation focused on the 
management considerations of introducing tools into any 
software development environment. 

The discussion was based on observations made while con- 
sulting in this field with a number of companies. The im- 
portance of having an appropriate tool environment in 
developing software was brought out; and the problems in- 
volved in the implementation of such an environment were 
discussed from both the user and managerial point of view. 

In particular, Sayani identified specific recommendations 
(both dos and don'ts) to guide the process of adopting 
tools. The central theme of his presentation was the need 
for a systems approach to the management of software tech- 
nology. 

This presentation generated considerable audience interest. 
The chairman of the afternoon sessions, Vic Basili, remarked 
that Sayani had presented a comprehensive list with which he 
agreed. The speaker clarified the following points in the 
ensuing discussion; 

• The tools whose implementations were studied in- 
cluded PSL/PSA, data base design tools, process design 
tools, and librarian systems. 

• Members of the audience remarked that the study 
appeared to be applicable to the implementation of other 
technologies in addition to tools. Sayani agreed and stated 
that the approach might also be applied to introducing tech- 
nology to developing nations. 

• Users generally agree that tools are oversold. 

This situation creates management problems. 

• Methodologies and tools tend to be sold to people 
with weak systems backgrounds who do not understand how the 
new technologies interact with the total software develop- 
ment life cycle. 
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• The training and maintenance of a toolsmith group 
is an important part of the tool implementation process to 
avoid the problem of tools falling into disuse when key 
people leave the environment. 

• Companies should also standardize and institu- 
tionalize these tools to enforce their use. 

• A member of the audience remarked that Japanese 
management techniques might be applicable to this topic. 
Sayani responded that certain of their techniques would be 
pertinent but others would not because of cultural differ- 
ences. However, the Japanese have adopted the use of cer- 
tain technologies that were developed here but are not as 
widely used in this country. For example, there are a large 
number of PSL/PSA users in Japan. 
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SESSION 4 - SOFTWARE METHODOLOGIES 


Mike Dyer - "The Clean Room Software Development Process" 

The first speaker of the fourth session was Mike Dyer from 
IBM, who presented the results of work done jointly with 
Harlan Mills. The purpose of the presentation was to de- 
scribe the mechanics of the "clean room" software develop- 
ment process. Pilot projects for this approach are still 
being set up. 

After the preparation of a structured specification, the 
software development process is divided between two groups 
of people: design engineers and product engineers. The 

design engineers will design and code the software product 
with the goal of producing first-time correct code. No use 
of the computer will be made by the design engineers in ac- 
complishing this goal; instead, extensive inspections and 
reviews will be conducted. The product engineers will per- 
form operational testing on the code produced by the design 
engineers with the goal of testing for the customer environ- 
ment. Tests will be selected randomly from a set of tests 
developed by the product engineers from the structured spec- 
ification, and errors identified by the product engineers 
will be returned to the design engineers for correction. 

This software development process purposely omits the usual 
step of unit testing. 

Dyer stated that, based upon small experiments already con- 
ducted, there is evidence that this process works. More 
extensive experiments are now being planned in which data 
will be collected to evaluate the effect of this approach on 
the reliability of the software produced. 

The audience reaction generated by this presentation was the 
largest of the entire workshop. Harlan Mills joined Mike 
Dyer in responding to the questions from the audience. 
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The following points were brought out in the ensuing dis- 
cussion; 

• Design engineers will be experienced in software 
design and coding; product engineers will be experienced in 
system integration and testing. Dyer and Mills indicated 
that IBM currently has on its staff skilled people who can 
perform, or can be trained to perform, in this new environ- 
ment. 

• The product engineers are not considered quality 
assurance personnel. They must perform the analysis neces- 
sary to produce the data base of test cases from the struc- 
tured specification. They must also run the tests and 
analyze the results. To function properly the product en- 
gineers must have a thorough knowledge of the customer's 
operational environment. 

• The product engineers will participate in drawing 
up the structured specification. They will reenter the 
software life cycle after the code is developed. They will 
not be allowed access to design materials during the testing 
phase. 

• Good specifications are necessary for this approach 
to be successful. The entire process is based on the use of 
a structured specification methodology. 

• This approach to software development is not pri- 
marily aime.d at cost savings. The question of whether or 
not the "clean room" process will yield productivity gains 
has not been addressed. The expected benefit is in the in- 
creased reliability of the software produced. However, the 
testing phase in the "clean room" process is not expected to 
cost any more than is currently spent in the usual unit, 
functional, and acceptance testing phases. 


S. Esiingei 
CSC 
17 of 21 



• This process is also not expected to help in sizing 
software systems. 

• Mills and Dyer clarified an earlier point by saying 
that test data will not be chosen at random. Rather, random 
tests will be selected from a data base of test cases that 
are designed to test all capabilities set forth by the 
structured specification. There will be errors that are not 
found by the random selection of tests, but evidence is 
available that random testing is as good as any other form 
of testing. In fact, since in sampling theory the sample 
size, and not the population size, is critical. Mills be- 
lieves that a random sample of tests can provide better 
testing coverage than conventional testing. 

• Evidence also exists that successful system testing 
can be performed without unit testing. 

• Mills indicated that they do not expect to attain 
perfection but that they do expect to achieve an increase in 
reliability. 

• No plans have been made to seed code with errors to 
assess the efficiency and effectiveness of the product engi- 
neers. 

• A member of the audience observed that this process 
appears to push error detection farther into the software 
life cycle. Dyer responded that this is not the case. More 
errors are expected to be found by the design engineers 
through the review process. Moreover, since the product 
engineers will be performing operational testing, they are 
expected to find errors that normally would not be uncovered 
until the software was operational. 

• To evaluate this process, a complete history of 
errors must be maintained. 
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• Several members of the audience questioned the use 
of mean time between failures (MTBF) as a measure of soft- 
ware reliability. Mills and Dyer indicated that they be- 
lieved MTBF to be a reasonable measure and one that was 
familiar to management and demanded by customers. Vic Basili 
indicated that MTBF is a measure that is associated with 
other measures of software quality. Another member of the 
audience suggested the use of mean time to repair (MTTR) . 

• Mills emphasized that the "clean room" software 
development process would require some modification in pro- 
grammer behavior. Since it is known that programmers can 
write thousands of lines of correct code, the goal of pro- 
ducing first-time correct code is not unreasonable. Pro- 
grammers must be made to believe that they can do this 
without the use of the computer. Mills and Dyer hope to 
achieve this behavior modification by not allowing the pro- 
grammers to have access to the compilers. 


• Mills also stated that product engineering was de- 
vised because they felt that testing is a critical part of 
the development process. This process does not remove the 
ability to test the software; rather, design engineers are 
asked to test by thinking instead of making computer runs. 

• No projects using this approach are yet complete. 
The pilot projects are still in the process of being set up. 


• The approach is expected to work for any type of 
software application. 


ments he 
errors, 
the test 


Vic Basili indicated that in recent testing experi- 
has run, the functional tests uncovered most of the 
However, the testers did not always recognize that 
results had indicated errors. 
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Bob Jones 


"Selecting a Software Development Methodology" 


The second speaker of the session was Bob Jones from Hughes 
Aircraft, who discussed an approach for selecting a software 
methodology. The presentation centered on a Hughes contract 
with the U.S. Air Force to define a set of tools and method- 
ologies to be used for integrated digital flight control 
software development. In response to this specific need of 
the Air Force, Hughes surveyed the environment and attempted 
to take a logical approach to the selection of tools and 
methodologies for that environment. The results of the 
study have been presented in a guidebook, a document of con- 
siderable size. Jones indicated that Hughes has started to 
collect data to evaluate the cost benefits of using the 
techniques specified by the guidebook. 

In response to questions from the audience, Jones clarified 
several points; 

• The tools and methodologies recommended included 
the use of CADSAT, structured design, high-order languages, 
and modern programming languages. 

• The software produced will not be verified in 
flight. There is a standard procedure for verifying flight 
control software that uses simulated data. It is not 
planned to use the software produced by this experiment in 
flight but only to verify that it performs according to 
specification. 

• Hughes will be collecting only cost data for this 
experiment. In evaluating cost-benefit tradeoffs, the bene- 
fits obtained by following the guidebook will be determined 
by the customer. 

• A member of the audience pointed out that if the 
guidebook covered all the tools and methodologies mentioned, 
it would constitute a 4-year curriculum. Jones agreed but 
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stated that the guidebook did not present detailed instruc- 
tions in the technologies. 

Richard Hamilton - "Development Techniques for Generic 

Software" 

The last speaker of the session was Richard Hamilton from 
Bell Laboratories, who spoke about a methodology for devel- 
oping generic software. His discussion centered on one 
class of application: networking with a specific protocol. 

The use of a layered approach and a finite state machine in 
implementing the X.25 protocol was presented. The complex- 
ity, size, and speed of the newly developed generic program 
were compared to an older, machine- spec if ic X.25 protocol 
program. Hamilton indicated that the complexity of the two 
programs was about the same. However, the size of the ge- 
neric program was larger and its speed was faster. 

In response to questions from the audience, Hamilton clari- 
fied the following points: 

• The complexity measure used was the McCabe measure 
that provides a measure of the number of branches in the 
program. 

• Hamilton indicated that the finite state machine 
used in the generic program was modeled as closely as pos- 
sible to the specification. 

• A member of the audience commented that there might 
be a size and/or speed tradeoff effect operating in this 
instance. That is, the increased size in terms of more mod- 
ularity might contribute to its increased speed. 

• The layered approach often requires extra overhead 
in additional procedure calls. Hamilton noted that several 
hundred extra bytes were attributable to this overhead. 

• No attempt was made to use macros to decrease the 
overhead. 
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EVALUATING SOFTWARE DEVELOPMENT CHARACTERISTICS: 

A Comparison Of Software Errors In Different Environments 

David M. Weiss 
Naval Research Laboratory 


Introduction 


According to the mythology of computer science, the first computer program 
ever written contained an error. Error detection and error correction are now 
considered to be the major cost factors in software development [Boe72, Boe73, 
Wol74]. Much current and recent research is devoted to finding ways to 
prevent sotware errors. One result is that techniques claimed to be effective 
for preventing errors are in abundance. Unfortunately, there have been few 
empirical attempts to verify that proposed techniques work well in production 
environments. Indeed, there have been few attempts even to collect data that 
could yield insight into the issues involved. The purpose of this paper is to 
compare error data obtained from two different software development 
environments. 

To obtain data that was complete, accurate, and meaningful, a 
goal-directed data collection methodology was used. Tne approach was to 
monitor changes made to software concurrently with its development. The 
results reported here were obtained by applying the methodology to three 
projects at NASA/GSFC, and one project at the Naval Research Laboratory 
(NRL). Although all changes were monitored for most projects, we are 
concerned here only with results obtained from the error data, and only with 
data that may be used to compare the two environments . Readers interested in 
a more detailed description of the research methodology or other analyses 
using other data from the same sources are referred to [BasSl, Wei79, WeiSl]. 

Research Methodology 


The methodology is goal oriented. It starts with a set of questions to be 
answered, and proceeds step-by-step through the design and implementation of a 
data collection and validation mechanism. Analysis of the data yields answers 
to the questions of interest, and may also yield a new set of questions. The 
procedure relies heavily on an interactive data validation process; those 
supplying the data are interviewed for validation purposes concurrently with 
the software development process. The methodology has six basic steps, as 
described in the following. 

1. Establish the goals of the data collection. 

Many (but not all) of our goals are related to claims made for the 
software development methodology being used. As an example, a goal 
of a particular methodology might be to develop software that is easy 
to change. The corresponding data collection goal is to evaluate the 
success of the developers in meeting this goal, i.e. evaluate the 
ease with which the software can be changed. 
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2. Develop a list of questions of interest 

Once the goals of the study are established, they are used to develop 
a list of questions to be answered by the study. In general, each 
goal will result in the generation of several different questions of 
interest. For example, if the goal is to evaluate the ease with 
which software can be changed, we may identify questions of interest 
such as; "Is it clear where a change has to be made?", "Are 
changes confined to a single modules?", "What was the average effort 
involved in making a change?" 

3. Establish data categories 

Once the questions of interest have been established, categorization 
schemes for the changes and errors to be examined may be constructed. 
Each question generally induces a categorization scheme. If one 
question is, "How many errors result from requirements changes?", one 
will want to classify errors according to whether or not they are the 
result of a change in requirements. 

4. Design and test data collection forms 

To provide a permanent copy of the data and to reinforce the 
programmers' memories, a data collection form is used. Forms design 
was one of the trickiest parts of the studies conducted, and will not 
be discussed here. 

5. Collect and validate data 

Data are collected by requiring those people who are making software 
changes to complete a change report form for each change made, as 
soon as the change is completed. Validation consists of checking the 
forms for correctness, consistency, and completeness, and 
interviewing those filling out the forms in cases where such checks 
reveal problems. Both collection and validation are concurrent with 
software development. 

6. Analyze the data 

Data are analyzed by calculating the parameters and distributions 
needed to answer the questions of interest. 

To apply the methodology to the collection of change data, the following 
definitions were used. 

A change is an alteration to baselined design, code or documentation. 

An error is a discrepancy between a specification and its implementation. 

A modification is a change made for any reason other than to correct an 
error. 


D. Weiss 
NRL 
2 of 25 



The Projects Studied 


The studies reported here contain complete results from four different 
projects. Two different environments and several different methodologies were 
used. One environment was a research group at the Naval Research Laboratory 
(NRL), and the other was a NASA software production environment at Goddard 
Space Flight Center. Table 1 is an overview of the data collected for each 
project. For the ARF project, only error data were collected. Table 2 gives 
the values of parameters often used to characterize software development 
projects. 

The Architecture Research Facility 


The purpose of the Architecture Research Facility (ARF) project, developed 
at NRL, was to develop a facility for simulating different computer 
architectures. The simulation is based on a description of the target 
architecture written in the Instruction Set Processor language [Belli]. 

A complete description of the ARF simulator is available elsewhere [Elo79]. 
Briefly, to simulate a machine, the ARF uses a set of tables that describe the 
machine being simulated and its state, a module to perform instruction 
simulation, and a module to handle the interface to the user. The machine 
description contained in the tables is produced by an ISP compiler (an 
existing compiler was used) 

The ARF was developed by a team of nine people, not all full time. 
Development took about ten months and 192 people-weeks, exclusive of 
consulting and secretarial support, to develop. The delivered system 
contained about 20,000 lines of FORTRAN code. 

The primary goal of the ARF designers was to produce a working simulator 
that would permit Che simulation of small targetrmiachine programs. The 
designers also viewed the ARF development as an experiment in the application 
of software engineering technology [Elo79]. The key parts of the technology 
used are the following. 

* Rather than developing the whole system at one time, the ARF was to 
be done using the family approach to software development [Par76]. 

The system was to be built in three main stages. Each stage would 
produce a member of the ARF "family" of programs, providing different 
facilities . 

* Tne information-hiding principle [Par72a] was to be applied to 
conceal design decisions that were expected to change during the 
lifetime of the ARF. 

* Informal design specifications, followed by standardized interface 
specifications, followed by high-level language coding specifications 
were written for each major module of the ARF before any code was 
written. Each specification was reviewed before its successor was 
produced . 

* FORTRAN code was written from the coding specifications, compiled, 
and then reviewed by someone other than the coder prior to debugging. 
The coder debugged the code and delivered it for testing. A tester 
(usually) other than the coder or designer, was selected to test the 
debugged code . 
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* At the possible expense of some run time performance, several 

debugging aids were designed into the system to make development 
easier. These included 

a. A method for detecting errors involving improper access to 
table entries, known as the binding mechanism, 

b. A consistent execution— time error reporting scheme for 
table interface functions, and 

c. A mechanism for inserting, and turning on and off, 
debugging code through the use of a compile-time 
preprocessor . 

The Software Engineering Laboratory 


The Software Engineering Laboratory (SEL) is a NASA sponsored project to 
investigate the software development process, based at Goddard Space Flight 
Center (GSFC). A number of different software development projects are being 
studied as part of the SEL investigations [BaiSl, Bas77]. Studies of changes 
made to the software as it is being developed constitute one part of those 
investigations . 

Typical projects studied by the SEL are medium size FORTRAN programs that 
compute the position (known as attitude) of unmanned spacecraft, based on data 
obtained from sensors on board the spacecraft. Attitude solutions are 
displayed to the user of the program interactively on CRT terminals. Because 
the basic functions of these attitude determination programs tend to change 
slowly with time, large amounts of design and sometimes code are often re-used 
from one program to the next. The programs range in size from about 20,000 to 
about 120,000 lines of source code. They include subsystems to perform such 
functions as reading and decoding spacecraft telemetry data, filtering sensor 

data, computing attitude solutions based on the sensor data, and providing an 

(interactive) interface to the user. 

Development is done by contract in a production environment, and is often 
separated into two distinct stages. The first stage is a high-level design 
stage. The system to be developed is organized into subsystems, and then 
further subdivided. For the purposes of the SEL, each named entity in the 

system is called a component. The result of the first stage is a tree chart 

showing the functional structure of the subsystem, in some cases down to the 
subroutine level, a system functional specification describing, in English, 
the functional structure of the system, and decisions as to what software may 
be reused from other systems. 

The second stage consists of completing the development of the system. 
Different components are assigned to (teams of) programmers, who write, debug, 
test, and integrate the software. Before delivery, the software must pass a 
formal acceptance test. On some projects, programmers produce no intermediate 
specifications between the functional specifications produced as part of the 
first stage and the code. Some projects produce pseudo-code specifications 
for individual subroutines before coding them in FORTRAN. During the period 
of time that the SEL has been in existence, a structured FORTRAN preprocessor 
has come into general use. 

In distinction to the ARF developers, NASA is not concerned with 
experimenting with new software engineering techniques. It is concerned with 
introducing improved techniques into its software development process. 
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Nonetheless, the principal design goal of the major SEL projects is to produce 
a working system in time for a spacecraft launch. Results from SEL studies of 
three different NASA projects, denoted SELl, SEL2, and SEL3, are included here. 


Project 

Number Of 
Changes 

Number of Number of 

Modifications Errors 

SELl 

281 

101 

180 

SEL2 

229 

no 

119 

SEL3 

ARE 

760 

453 
Table 1 

307 

143 

Overview of Data Collected 



Effort 

Number of 

Lines of 

Dev. Lines 

Number of 


(Months) 

Developers 

Code (K) 

of Code (K) 

Component 

Project 






SELl 

79.0 

5 

50.9 

46.5 

502 

SEL2 

39.6 

4 

75.4 

31.1 

490 

SEL3 

98.7 

7 

85.4 

78.6 

639 

ARE 

44.3 

9 

21.8 

21.8 

253 


Table 2 Summary of Proj 

ect Information 



Project 

Errors Per K Lines 
Of Developed Code 

Errors Resulting 
Erom Change 
(As Percentage 
Of NonClericals) 

Repeated Error : 
(Average Number 
Of Corrections 
Per Error) 

SELl 

3.9 

5 

1.02 

SEL2 

3.8 

14 

1.08* 

SEL3 

3.9 

12 

1.05 

ARE 

6.6 

13 

1.007 


* Upper bound. Exact number of repeated errors for SEL2 is unknown. 
By conservative means, the ratio could be estimated as 1.04. 

Table 3 Measures of Erroneous Change 
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Results 


The results presented here are derived from analyses of several different 
data parameters and distributions. Table 3 shows error density, errors 
resulting from change, and repeated error ratio for each project. These 
parameters indicate that for all projects most changes were made correctly on 
the first attempt. 

Figures 1 and 2 are an overview of the change distributions for the SEL 
projects (recall that data on modifications is not available for the ARF 
project). Figure 3 shows sources of modifications, i.e. reasons for modifying 
the software, and figure 4 shows sources of nonclerical errors. Although 
there were a significant number of requirements changes for two of the SEL 
projects, none of the projects show a significant number of errors resulting 
from incorrect or misunderstood requirements. 

For all projects, the major source of errors was the design and 
implementation of single components. (For these projects, a single component 
is nearly always a FORTRAN subroutine or block data.) Relatively few errors 
were the result of misunderstandings of requirements, specifications, 
programming language or compiler,’ or software or hardware environment. 

Aspects of the design involving more than one component was also not a major 
source of errors. Figure 5 shows a continuation of the same pattern. For 
most projects, interfaces were not a significant source of errors. 

A further categorization of design and implementations errors, including 
both single and multi-component design errors is shown in figure 6. The 
pattern for the SEL and ARF projects is quite different here; relatively few 
ARF errors involved the use (including definition, representation, and access) 
of data. For the SEL projects, data errors were a significant fraction of 
design and implementation errors. 

A direct measure of ease of error correction is shoxro in figure 7. For 
all projects, the overwhelming majority of errors took less than a day of 
effort to correct. Indeed, most error corrections took an hour or less of 
effort. 

Figure 8 is a measure of locality of errors with respect to project 
components. Only components that required at least one error correction (one 
fix) are represented. The majority of such components required no more than 
one correction. For all projects, 80% or more of such components were 
corrected at most three times. 

Locality of errors with respect to project subsystem (project module for 
the ARF), is shown in figure 9. The distributions here show the reverse 
pattern of those in figure 8, i.e. most corrections are clustered in a few 
subsystems (modules). 

Cone lusions 


The ARF and SEL projects involved different applications and were 
developed in different environments, using different methdologies, people with 
different backgrounds, and different computer systems. Despite these 
differences there are a number of similarities between the two, as listed in 
the following. 
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1 . 


There is a comnion pattern to the sources of error 
distributions. The principle error source is in the design and 
implementation of single routines. Requirements, specifications 
and interface misunderstandings are all minor sources of errors. 

2. Few errors are the result of changes, few errors require more 
than one attempt at correction, and few error corrections result 
in other errors. 

3. Relatively few errors take more than a day to correct. 

These similarities may be explained by different factors in the different 
environments. The SEL projects may be viewed as redevelopments. Much of the 
same design and some of the same code is reused from one project to the next. 
As a result of experience with the application, the changes most likely to 
occur from one project to the next have been identified by the designers. The 
systems are now designed so that these changes are easy to make. Confirmation 
of this explanation was provided by one of the primary system designers in 
discussions held after the data were analyzed. 

In the ARF environment, the explicit use of techniques to identify and 
design for potential changes is a likely contributing factor to the 
similarities in the distributions. 

Common factors to both the SEL and ARF projects were the stability of the 
hardware and software supporting the development and the familiarity of the 
programmers with the language they were using. 

The most striking difference between the ARF and SEL projects is in the 
proportion of intended use to data errors. The ARF project has a considerably 
smaller proportion of data errors than the SEL projects. One reason for this 
may be the conscious attempt of the ARF developers to apply abstract data 
typing and strong typing in their design. 
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PURPOSE OF RESEARCH 


FIND A WAY OF EVALUATING SOFTWARE DEVELOPMENT METHODOLOGIES 


* LEARN ABOUT THE SOFTWARE DEVELOPMENT PROCESS 


* LEARN ABOUT MEASURING THE SOFTWARE DEVELOPMENT PROCESS 


APPROACH 


STUDY CHANGES USING GOAL-DIRECTED DATA COLLECTION 
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RESEARCH METHODOLOGY DEVELOPED 


ESTABLISH GOALS 

EXAMPLE: EVALUATE THE DIFFICULTY OF CHANGING SOFTWARE 

DEFINE QUESTIONS OF INTEREST 

EXAMPLES: IS IT CLEAR WHERE A CHANGE HAS TO BE MADE? 

ARE CHANGES CONFINED TO SINGLE MODULES? 

WHAT WAS THE AVERAGE EFFORT INVOLVED IN MAKING A 
CHANGE? 

DESIGN DATA COLLECTION FORM 

COLLECT AND VALIDATE DATA CONCURRENTLY WITH DEVELOPMENT 
ANALYZE DATA 
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TYPES OF CHANGES 


* DEF: A CHANGE IS AN ALTERATION TO (BASELINED) DESIGN, CODE, OR 

DOCUMENTATION. 

* DEF: AN ERROR IS A DISCREPANCY BETWEEN A SPECIFICATION AND ITS 

IMPLEMENTATION. 

* DEF; A MODIFICATION IS A CHANGE MADE FOR ANY REASON OTHER THAN TO 

CORRECT AN ERROR. 

* CHANGES = MODIFICATIONS + ERROR CORRECTIONS 
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SUBCATEGORIES OF CHANGES 


* MODIFICATIONS 

IMPLEMENTATION OF REQUIREMENTS CHANGE 
OPTIMIZATIONS 

IMPROVEMENTS OF USER SERVICES 

IMPROVEMENT OF CLARITY, MAINTAINABILITY, OR DOCUMENTATION 
ADAPTATION TO ENVIRONMEIFE CHANGE 

* ERROR CORRECTIONS 

CLERICAL ERRORS 
NON-CLERICAL ERRORS 

REQUIREMENTS INCORRECT OR MISINTERPRETED 

SPECIFICATIONS INCORRECT OR MISINTERPRETED 

DESIGN ERROR INVOLVING SEVERAL COMPONENTS 

ERROR IN DESIGN/IMPLEMNTATION OF A SINGLE COMPONENT 

ERROR IN USE OF PROGRAMMING LANG OR COMPILER 

MISUNDERSTANDING OF ENVIRONMENT 
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Number of 

Number of 

Number of 


Changes 

Modifications 

Errors 

Project 




SELl 

281 

101 

180 

SEL2 

229 

110 

119 

SEL3 

760 

453 

307 

ARE 



143 

A- 7 

38 

9 

79 


Table 

5.4a Overview of 

Data Collected 



Effort 

Number of 
Developers 

Lines of 
Code (K) 

Dev. Lines 
of Code (K) 

Number of 
Component 

Project 

SELl 

79.0 

5 

50.9 

46.5 

502 

SEL2 

39.6 

4 

75.4 

31.1 

490 

SEL3 

98.7 

7 

85.4 

78.6 

639 

ARE 

44.3 

9 

21.8 

21.8 

253 


A- 7 

Table 5.4b Summary of Project Information 
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Project 

Changes Per K Lines 
Of Developed Code 

Errors Per K Lines 
Of Developed Code 

Error To Mod Ratio 
(NonClericals Only) 

SELl 

6.0 

3.9 

1.3 

SEL2 

7.4 

3.8 

.92 

SEL3 

9.7 

3.9 

.54 

ARF 


6.6 



Table 5.5 Change and Error Densities 

Project 

Erroneous Change Rate 
(Ratio Of Changes 
Resulting In Errors 
To All Changes) 

Errors Resulting 
From Change 
(As Percentage 
Of NonClericals) 

Repeated Error Ratio 
(Average Number 
Of Corrections 
Per Error) 

SELl 

.025 

5 

1.02 

SEL2 

.061 

14 

1.08* 

SEL3 

.041 

12 

1.05 

ARF 


13 

1.007 


* Upper bound. Exact number of repeated errors for SEL2 is unknown. 
By conservative means, the ratio could be estimated as 1.04. 


Table 5.6 Measures of Erroneous Change 
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Number Of People Errors Per Person 

Project 

SEL2 4 25 

SELl 5 26 


SEL3 7 44 

ARF 9 10 


Table 5.7 Errors Per Person By Number Of People 


Project 

Effort 

(People-Months) 

Errors Per 
Person-Month 

Changes Per 
Person-Month 

SEL2 

39.6 

2.4 

5.8 

ARF 

44.3 

2.1 


SELl 

79.0 

1.7 

3.6 

SEL3 

98.7 

3.1 

7.7 


Table 5.8 Errors Per Effort By Effort 
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Figure 5.2 Changes (Clerical Errors Excluded). 
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Key to Figure 5.3 

Design Modifications caused by changes in design 

Debug Modifications to insert or delete debug code 

Env Modifications caused by changes in the hardware or software 

environment 

PE Planned Enhancements 

Req Modifications caused by changes in requirements or functional 

specifications 

Unknown Causes of these modifications are not known 


Figure 5.3. Sources of Modifications. 
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Figure 5.5. Sources of Nonclerical Errors. 
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Figure 5.7 Interface Errors. 
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SEL1 


SEL2 


Key to Figure 5.6 

Data Error in the use of data 

Intended Use Error in intended function, i.e. program behavior does not 
correspond to the intended use of the program 




Type of Error 


Type of Error 


SEL3 


ARF 


Figure 5.6 Sources of Design/Implementation Errors. 
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Figure 5.10. Effort to 
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ARF Effort to Fix 


Nonclerical Errors. 
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Figure 5.15. Frequency Distribution of Fixes. 


CONCLUSIONS ABOUT SOFTWARE DEVELOPMENT COMMON TO NRL AND NASA/GSFC 

* PRINCIPAL ERROR SOURCE IS DESIGN AND IMPLEMENTATION OF SINGLE ROUTINES 

REQUIREMENTS, SPECIFICATIONS, AND INTERFACE MISUNDERSTANDINGS ARE 
MINOR SOURCES OF ERRORS. 

* FEW ERRORS ARE THE RESULT OF CHANGES, FEW ERRORS REQUIRE MORE THAN 
ONE ATTEMPT AT CORRECTION, AND FEW ERROR CORRECTIONS RESULT IN OTHER 
ERRORS. 

* RELATIVELY FEW ERRORS TAKE MORE THAN A DAY TO CORRECT. 

DIFFERENCES BETWEEN ARF AND SEL SOFTWARE DEVELOPMENT 

* THE PROPORTION OF ARF ERRORS INVOLVING DATA IS CONSIDERABLY SMALLER 
THAN THE CORRESPONDING PROPORTION FOR SEL ERRORS 


D. Weiss 
NRL 
25 of 25 



METHODOLOGY EVALUATION : 
EFFECTS OF INDEPENDENT VERIFICATION 
AND INTEGRATION ON ONE CLASS OF 
APPLICATION 


Jerry Page 

COMPUTER SCIENCES CORPORATION 
and 

GODDARD SPACE FLIGHT CENTER 
SOFTWARE ENGINEERING LABORATORY 


Prepared for the 
NASA/GSFC 

Sixth Annual Software Engineering Workshop 



J. Page 
CSC 
1 of 47 


METHODOLOGY EVALUATION: 

EFFECTS OF 

INDEPENDENT VERIFICATION 
AND INTEGRATION ON 
ONE CLASS OF APPLICATION 


174-SEL-(33) 1 


Viewgraph 1: Title 


One area of study in the Software Engineering Laboratory 
(SEL) is methodology. This presentation describes the 
effects of an independent verification and integration (V&I) 
methodology on one class of application. v&I is the name 
that we will use for what some call independent verification 
and validation (iv&V) and others call verification and vali- 
dation (V&V) . "One class of application" means the develop- 
ment of solutions for a set of similar problems 
(ground-based support for satellite operations) that are 
developed in the same computing environment--simply put, a 
specific problem in a specific environment. 


Goddard Space Flight Center, SEL-81-104, "The Software En- 
gineering Laboratory" (Software Engineering Laboratory 
Series), D. N. Card et al., February 1982. 
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Viewqraph 2: Resource Profiles 

Why use a V&I methodology? Why have we experimented with a 
V&I methodology? To introduce V&I methodology, let me show 
you resource profiles for four real projects developed for 
the Goddard Space Flight Center (GSFC) by Computer Sciences 
Corporation (CSC) and monitored closely by the SEL. These 
resource profiles show technical hours charged to the proj- 
ects by week. Technical hours are those hours charged by 
the programmers and the first-line managers. First-line 
managers are those managers who make decisions, set prior- 
ities, and solve problems daily, as opposed to higher level 
managers who receive weekly or less frequent progress re- 
ports. Tnese resource profiles also do not induce service 
charges, which amount to approximately 13 percent of the 
hours charged to a project. Service hours include those 
hours charged by librarian, secretarial, technical, publica- 
tions, and data technician support groups. 

In these profiles, design activity starts at the far left- 
hand side and continues throughout the project at decreasing 
levels. The first vertical line indicates the conclusion of 
a series of requirements analysis and critical design re- 
views. It is the point at which implementation and corre- 
sponding testing are allowed to begin. The second vertical 
line is the point at which implementation (coding) is sup- 
posed to be complete and system testing starts. The third 
vertical line is the point at which the software is supposed 
to be ready (for operation) and acceptance testing starts. 
The fourth vertical line indicates the end of acceptance 
testing and the beginning of maintenance (by another group) . 

Most people who measure software products apply many meas- 
ures to the software product from the point at which it en- 
ters the maintenance and operation (M&O) phase. We do too, 
but since we have no responsibility for the software once it 
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is transferred to the maintenance group and because it is 
more difficult to collect data through another group, we 
apply many of our measures one or two phases earlier, i.e., 
from the beginning of acceptance testing or from the begin- 
ning of system testing. 


As you can see from three of these four profiles (excluding 
the one in the upper left-hand quadrant) , the peak effort is 
at the start of acceptance testing. Some of the reasons 
that the peak effort occurs at that point are 


• All the projects grow between 15 and 40 percent 
after the start of implementation because of re- 
quirements escalation. 


• These projects cross two or three funding periods. 
This puts some constraint on how much work can be 
done in any one funding period. 

• Management problems exist. The profile in the 
lower left-hand quadrant shows the application of 
the "mythical man-month." 

• There is a hard deadline (launch of a satellite) . 

• The computers are not very reliable (6- to 8-hour 
mean time to failure) . 

We know what we are doing during that peak effort (the peak 
at the third vertical line). A large fraction of our work 
there is correcting errors. 

It is commonly accepted that the cost to correct an error 
approximately doubles as it enters each new phase of the 
development life cycle. For example, if an error originates 
in the requirements phase (the phase preceding design) and 
if that requirements error gets designed, the cost to cor- 
rect the error during design will be one to two times more 
than to correct the error in the requirements phase. If the 
designed requirements error gets implemented, the cost to 
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correct the error during implementation will be two to four 
times more than to correct the error in the requirements 
phase. If the implemented requirements error enters the 
system testing phase, the cost to correct the error will be 
four to eight times more. If the implemented requirements 
error enters the acceptance testing phase, the cost to cor- 
rect the error will be 8 to 16 times more. If it enters the 
M&O phase, the cost to correct the error will be 16 to 
32 times more (for one simplified example, see Figure 1). 

The same progression holds for errors that originate in de- 
sign and implementation. Therefore, during the M&O phase, 
even implementation errors are costly to correct; they cost 
four to eight times more to correct during the M&O phase 
than during the implementation phase. 

We do not need a general hypothesis to know that it costs 
more to correct errors in the later stages of development. 
Our own data collected over the last 5 years shows that some 
increase occurs in the cost of correcting errors from one 
phase of development to the next. SEL data shows that (re- 
gardless of error type) the average error discovered during 
the acceptance testing phase costs more to correct than the 
average error discovered during the system testing phase and 
that the average error discovered during the system testing 
phase costs more to correct than the average error dis- 
covered during the implementation phase. The increase in 
the average effort to correct the average error from one 
phase to the next varies from project to project, but it 
frequently approximates a doubling of effort. 

Common sense indicates that there will be cost increases for 
changes to the evolving product as development progresses 
through the life cycle. Certainly, in this environment 
there are several transfers of responsibility: from the 

requirements team to the development team, from the 
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RELATIVE 
COST TO 
FIX ERROR 



Figure 1. Cost of Correcting Software Errors 



designers to the implementer s , from the implementer s to the 
testers, and finally, from the development team to the main- 
tenance team. These are -not complete transfers of responsi- 
bility; instead, the team size increases or decreases at 
different points in the development life cycle. Because a 
system is never 100-percent completely or accurately docu- 
mented and because few people can instantaneously absorb the 
content of the documentation, new team members will require 
additional time to become familiar with the system. There- 
fore, functions will increase in cost when new members or 
groups become responsible for them. 

Since the average development team size is six members, pre- 
maturely removing one member from the team always affects 
the schedule adversely. If the schedule cannot be adjusted 
(adjustments are more difficult late in the life cycle 
because of launch deadlines) , then a replacement member must 
be added to the team. This replacement increases cost and 
it does not solve the schedule problem completely unless the 
replacement individual is more productive than the individ- 
ual who was replaced. 

'We know that we have to improve our methodology, both in 
management and development practices, to move error- 
correctipn efforts earlier into the development life cycle, 
closer to the commission of the errors. 

We know this from the advocates of V&I methodology, from our 
own SEL data, and from common sense. To save money, we must 
move the peak effort away from the start of acceptance test- 
ing (the third vertical line in the resource profile) and 
nearer to the design phase (between the first and second 
vertical lines in the resource profile) . For example, we 
spend approximately 30 percent of our dollars for system and 
acceptance testing (the area between the second and fourth 
vertical lines) . If 50 percent of that expenditure is for 
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error correction (15 percent of dollars) , then by moving 
that error-correction effort into the implementation phase, 
we will reduce the cost of that effort by approximately 
one-half; i.e., we will save approximately 7.5 percent of 
our development cost. 
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Viewqraph 3; Scaled Resource Profiles 

These resource profiles are scaled so that the start of ac- 
ceptance testing is 1 on the x-axis. The technical hours 
spent each week (the y-axis) are scaled by the developed 
lines of code (in thousands). The scaled resource profiles 
show technical hours per thousand lines of developed code by 
fraction of development life cycle. The unsealed resource 
profiles (see viewgraph 2) show technical hours by week of 
development life cycle. 
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DEVELOPMENT ENVIRONMENT 

CHARTER: DESIGN, IMPLEMENT, TEST, DOCUMENT 

TYPE OF SOFTWARE: SCIENTIFIC, GROUND-BASED, NEAR-REAL-TIME, 

INTERACTIVE GRAPHIC 

LANGUAGES: 85% FORTRAN, 15% ASSEMBLER MACROS 

MACHINES: IBM S/360-75 AND -95, BATCH WITH TSO 


PROCESS CHARACTERISTICS: 

AVERAGE 

HIGH 

LOW 

DURATION (MONTHS) 

15.6 

20.5 

12.9 

EFFORT (STAFF-YEARS) 

8.0 

11.5 

2.4 

SIZE (1000 LOO 

DEVELOPED 

57.0 

111.3 

21.5 

DELIVERED 

62.0 

112.0 

32.8 

STAFF (FULL-TIME EQUIV.) 

AVERAGE 

5.4 

6.0 

1.9 

PEAK 

10.0 

13.9 

3.8 

INDIVIDUALS 

14 

17 

7 

APPLICATION EXPERIENCE 

MANAGERS 

5.8 

6.5 

5.0 

TECHNICAL STAFF 

4.0 

5.0 

2.9 

OVERALL EXPERIENCE 

MANAGERS 

10.0 

14.0 

8.4 

TECHNICAL STAFF 

8.5 

11.0 

7.0 
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Viewqraph 4; Development Environment 


I will talk about four projects today. Two went into opera- 
tion about 2 years ago; the other two went into operation 
about 3 months ago. A V&I methodology was applied to the 
last two. The last two projects will be labeled V&I 1 and 


V&l 2 on the following viewgraphs. The projects that became 


operational 

2 years ago 

will be 

labeled 

Past 

1 and 

Past 2. 

Date 

Past 1 

Past 2 

V&I 

1 

V&I 2 

Development 

start 

May 1978 

June 

1978 

Oct. 

1979 

Oct. 1979 

Maintenance 

start 

Oct. 1979 

Aug . 

1979 

J une 

1981 

May 1981 

Operation 

start 

Feb. 1980 

Oct. 

1979 

Aug . 

1981 

Aug. 1981 

M&O end 

Active 

Sept. 

1980 

Ac tive 

Active 


This viewgraph shows the average value of each development 
characteristic and the high and low values of the develop- 
ment characteristics from 12 projects in one class of appli- 
cation. The high or the low values themselves do not 
represent one project but show the most and least of any 
characteristic attributed to any of the 12 projects. The 
four projects that I will talk about are included in these 
statistics . 

What is our development environment like? Our development 
teams design, implement, test, and document software that is 
scientific, ground-based, near-real-time , and interactive 
graphic. The software is 85 percent FORTRAN, 1 percent as- 
sembler, and 14 percent assembler macros. The assembler 
macros are required for the graphics capability. The soft- 
ware is developed on the IBM S/360-75 and -95, which are 
batch oriented with a timesharing option (TSO) . 

This is an operations environment, not a development envi- 
ronment. In this environment, the developers have access to 
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the IBM S/360-95 via a Remote Job Processing (RJP) terminal 
and via TSO terminals. The developers use the IBM S/360-75 
primarily in programmer-present blocks of time for integra- 
tion and system testing via a graphics device. The IBM 
S/360-95 is the primary day-to-day satellite operations ma- 
chine. When a hardware failure occurs, the developers lose 
access to the machine via the RJP and TSO terminals and must 
immediately relinquish their programmer-present time (if 
they have it) on the IBM S/360-75 so that operations activ- 
ities can continue with minimal interruption. Since 
programmer-present blocktime is scheduled weekly and since 
the schedule is usually fully booked, IBM S/360-95 hardware 
failures always affect the development schedule adversely, 
especially late in the development life cycle. 

In addition, the IBM S/360-75 is the primary satellite 
launch and launch- simulation operations machine. It is not 
unusual to have launches monthly, and frequently they are 
delayed on a day-by-day basis for 1 to 2 weeks or on a 
week-by-week basis for 2 to 4 weeks. When this happens, 
additional simulations are scheduled and/or additional mis- 
sion planning machine time is required. Again, the devel- 
opers must ' relinquish scheduled programmer-present 
block times . 

We estimate that 20 to 40 percent of scheduled programmer- 
present biocktime is lost because of hardware failures on 
both machines and because of launch delays. When frequent 
hardware failures and launches occur during the later stages 
of a development project, you can see how they can contrib- 
ute significantly to the peak effort at the start of accept- 
ance testing because of the need to make up lost machine 
time to complete the development project on schedule. 

On the average, the development process takes 15.6 months, 
requires 8 staff-years of effort, develops 57,000 lines of 
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code, and delivers 62,000 lines of code. Some amount of old 
code is used in each of these projects. The average staff 
size is 5.4 people and peaks at 10 people {full-time equiva- 
lents) . Fourteen individuals are usually involved; this 
figure includes the first-line managers, i.e., those mana- 
gers who make decisions, set priorities, and solve problems 
on a daily basis. For this application, on the average, the 
managers have 5.8 years of experience and the technical 
staff has 4 years. The technical staff includes the mana- 
gers (approximately 30 percent) . The managers have 10 years 
of professional experience overall, and the technical staff 
has 8.5 years of professional experience. 
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Viewgraph 5; V&I Experiment 

Why use a V&I methodology? It has often been claimed that 
the use of a V&I team would solve some of our problems. 

What we want to know from this experiment is "Does the use 
of an independent V&I team improve our development process 
and product?" To test this hypothesis, we will apply seven 
measures. These measures, however, are not completely inde- 
pendent of each other. They measure, in different ways, the 
occurrence of two basic properties: 

1. When errors are discovered earlier, they are less 
costly to correct. 

2. The use of a V&I methodology helps to discover er- 
rors earlier. 

The seven measures with explanations follow. 

1 . Decrease requirement's ambiguities and misinterpre- 
tations .. This will save time and money, especially in later 
stages of development. Overall, these are the most expen- 
sive errors to correct because requirements are the starting 
point for the development life cycle. 

To evaluate this measure, the development error data that is 
collected by the SEL from the development and V&I teams from 
the start of implementation through the completion of ac- 
ceptance testing will be examined. In this experiment, the 
use of a V&I methodology is not expected to reduce the de- 
velopment error rate; rather, it is expected to help dis- 
cover errors earlier. If the use of a V&I methodology 
provides this benefit, a larger fraction of requirements 
errors will be detected during the design phase, in which 
the SEL has no formal process for recording errors, and 
therefore, fewer requirements errors (a smaller percentage 
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of total errors ) will remain to be discovered during the 
formal reporting period.^ Compared with the past proj- 
ects, a 50-percent decrease in the percentage of require- 
ments errors reported by the development and V&I teams will 
be a clear indication of success for this measure. In addi- 
tion, since the V&I team will pursue the resolution of un- 
specified and ambiguous requirements, fewer of these 
requirements problems are expected in the later stages of 
development . 

2. Decrease design errors . This will save time and 
money in later stages of development. Design errors are the 
second most expensive to correct. 

TO evaluate this measure, the development error data will be 
used to compute the percentage of the design errors that are 
complex design errors. Complex design errors are many- 
component errors, whereas simple design errors are single- 
component errors. A component is a subroutine or shared 
block of code. Simple design errors are frequently related 
to (1) wrong assumptions about data values and structures, 
e.g., integer versus real variables, 2-byte versus 4-byte 
variables, location in buffer, or length of a format; 

(2) lapses in memory, e.g., missing items (declarations, 
dimensions, subscripts, statements, or counter incrementer s) 
or incorrect variable names (not misspellings); or (3) in- 
correct interpretation of computations, e.g., wrong sense of 
direction (sign operator), factors of 2 or root 2, or wrong 
order of steps. Complex design errors are frequently 


^Formal error reporting for development is keyed to machine- 
readable code that, in this environment, is the executable 
source code. Therefore, formal error reporting occurs only 
from the start of implementation through the completion of 
acceptance testing. Maintenance error data is collected 
from the maintenance group in a slightly different form. 
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related to interfaces and operational considerations and, 
therefore, they affect modules (several components). Since 
interfaces and operational aspects receive more scrutiny and 
high-level attention, they are more likely to be discovered 
during design reviews, which for the most part occur outside 
the formal error reporting period. The simple design er- 
rors, which are found in the detail of the design, are less 
likely to be found by a small V&I team (approximately 
15 percent of development effort) . If the use of a V&I 
methodology helps to discover complex design errors ear- 
lier, a larger fraction of the complex errors will be de- 
tected during the design phase, and therefore, fewer complex 
design errors (a smaller percentage) will remain to be dis- 
covered during the formal reporting period. Compared with 
the past projects, a 50-percent decrease in the percentage 
of complex design errors reported by the development and V&I 
teams will b-e a clear indication of success for this measure. 

3. Decrease the cost of correcting errors . According 
to those who advocate the use of a V&I methodology and from 
our own SEL data, we know that correcting errors one life 
cycle phase earlier will produce a significant savings. 

To evaluate this measure, the relative cost of correcting 
errors before and after acceptance testing started will be 
computed.^ If the use of a V&I methodology reduces the 
cost of correcting errors, the developers will spend less 
effort per error in the later stages of development. Com- 
pared with the past projects, a 20- to 25-percent reduction 


^Here, the relative cost of correcting errors is computed by 
tabulating the effort to correct errors (reported by the 
development teams) in each phase, computing the percentage 
of error-correction effort that occurred in each phase, and 
then dividing the error-correction effort percentage of each 
phase by the corresponding percentage of errors found in 
that phase. 


J. Page 

CSC 

19of47 



in the relative cost of correcting errors after acceptance 
testing started will be a positive indication of success for 
this measure. Maintenance error data that is collected by 
the SEL from the maintenance groups will also be used. 

4 . Decrease the cost of system and acceptance 
testing . If the first three items occur, less effort will 
be required in these phases. 

To evaluate this measure, the percentage of the development 
cost ■ required to complete system and acceptance testing will 
be computed.^ If the use of a V&I methodology helps to 
discover errors closer to the phase in which they origi- 
nated, (1) the development teams will spend less time cor- 
recting errors during system testing and the system tests 
will be completed sooner, reducing the cost of system test- 
ing and (2) the development teams will need only to prepare 
for and to demonstrate the acceptance tests, reducing the 
cost of acceptance testing. Compared with the past proj- 
ects, a smaller percentage of development cost for system 
and acceptance testing will be a positive indication of suc- 
cess for this measure. If the cost is less than the average 
cost for this application, it will be a clear indication of 
success . 

5. Increase the early discovery of errors . This will 
save time and money in later stages of development as stated 
aoove. It will also improve the reliability of the software 
or at least improve confidence in the reliability of the 
software, since error rates will be less (or the mean time 


^The development cost is computed by weighting the hours 
charged to a project by the different responsibilities of 
the personnel assigned to the project. A manager's hours 
are multiplied by 1.5; a programmer's hours are multiplied 
by 1.0; support service personnel's hours are multiplied by 
0.5. 
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between failures will be greater) in the later stages of 
development. To evaluate this measure, the development and 
maintenance error data will be used to compute the percent- 
ages of errors that were discovered before and after accept- 
ance testing started. If the use of a V&I methodology helps 
to discover errors earlier, most of the errors will be dis- 
covered before acceptance testing starts. Compared with the 
past projects, a 50-percent reduction in the percentage of 
errors discovered after acceptance testing started will be a 
clear indication of success for this measure. 

6. Improve the quality of the software put into opera- 
tion . This will decrease maintenance costs. In general, 
the use of a V&I methodology will be most beneficial in the 
M&O phase, since systems with lifetimes greater than 1 or 

2 years usually have maintenance costs that range from 30 to 
100 percent of the development cost. 

To evaluate this measure, the software and ^maintenance error 
data will be used to compute the error rate for the M&O 
phase. If the use of a V&I methodology improves the quality 
of the software put into operation, the error rate in the 
M&O phase will be smaller compared with the error rates of 
the past projects. An error rate less than the average er- 
ror rate (0.5 to 0.6 errors per thousand lines of developed 
code) for t h is application will be a positJ.ve indication of 
success for this measure. 

7. Mainta:in productivity and cost . Adding another 
interaction for the development team will slow them down and 
will, therefore, reduce their productivity . and increase the 
cost of development. However, if requirements and complex 
design errors are reduced, if the cost of correcting errors 
is reduced, and if the time spent on system and acceptance 
testing is reduced, those reductions should offset the cost 
of interaction between the development and V&I teams. 
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Therefore, productivity and development costs should remain 
the same. We do not expect to offset the cost of the V&I 
team completely, but optimistically speaking, we hope to. 

To evaluate this measure, the software and the weighted work 
hours charged to the projects by the development teams will 
be used to compute (in staff-months) the cost of 1000 lines 
of developed code. A cost less than or equal to the average 
cost (1.7 staff-months per thousand lines of developed code) 
for this application will be a clear indication of success 
for this measure. That is to say, an average cost for the 
development team .plus an added cost for the V&I team is a 
clear indication of success; the development teams will have 
maintained productivity despite the interaction with the V&I 
team. 

By one calculation, the cost of interaction with the V&I 
team is estimated to be 10 percent of the development ef- 
fort. Therefore, if the development teams are average in 
performance and require only the average cost even though 
they are interacting with a V&I team, the use of a V&i meth- 
odology will have effected approximately a 10-percent sav- 
ings in development cost. If the use of a, V&I methodology 
works well, i.e., if the first six measures show positive 
indications of success, then the combined cost of the devel- 
opment and V&I teams will be close to the average cost of 
development for this application. Since the cost of the V&I 
effort will be approximately 15 percent of the development 
effort and the estimated cost of interaction with the V&I 
teams is 10 percent, a combined cost of the development and 
V&I teams that is near the average development cost will 
indicate approximately a 25-percent savings in development 
cost (15 percent real savings) . 
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V&l TEAM^ 


CHARTER: 

VERIFY REQUIREMENTS AND DESIGN 
PERFORM SEPARATE SYSTEM TESTING 
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14 
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^SAME CONTRACTOR AS DEVELOPMENT TEAMS, BUT IN DIFFERENT 
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Viewqraph 6; V&I Team 

What did we expect the V&I team to do in this experiment? 

The V&I team was supposed to 

• Verify requirements and design. 

• Perform separate system testing 

• Validate the consistency from start to end (from 
requirements to product) 

• Fix nothing 

% Report all findings 

The V&I process lasted 14 to 16 months and required an ef- 
fort of 16 to 18 percent of the development effort. The 
process required an average of 1.1 people and peaked at 
3 people (full-time equivalents) . Six individuals were in- 
volved, including the first-line managers. The application 
and overall experience of the technical staff was similar to 
that of the development teams (viewgraph 4); the managers, 
however, had a little more experience. 

The y&I.team was associated with the same contractor as the 
development teams but came from a different operational area. 

:3ext, we will examine the results of the experiment. 
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Viewqraph 7; Measure 1 - Requirements Problems and 
Measure 2 - Design Flaws 


This viewgraph shows the breakdown, by percentages, of all 
the requirements and design errors detected from the start 
of implementation through the end of acceptance testing. 

1. Requirements Errors 
Expectation ; 

For requirements errors, we expect to see a 50-percent 
decrease in the percentage of requirements errors. 

Findings ; 

From the bar graphs, you can see that the percentage of 
requirements errors for both V&I projects was reduced 84 
to 90 percent compared with the past projects. In addi- 
tion, very few requirements remained unspecified in the 
later stages of development. Hence, there were very few 
late surprises in terms of requirements problems com- 
pared with the past projects. 

Conclusion ; 

The use of a V&I methodology did significantly decrease 
requirements ambiguities and misinterpretations. 

2. Design Errors 
Expectation ; 

For design errors, we expect to see a 50-percent de- 
crease in the percentage of complex design errors. Com- 
plex design errors are those involving many components. 
Simple design errors are single-component errors. A 
component is a subroutine or a shared block of code. 

F i nd i ng s ; 

From the bar graphs, you can see that the percentages of 
complex design errors for the V&I projects are 26 and 
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23 percent of the total design errors. It is a little 
less for the two past projects (23 and 18 percent). 

Conclusion ; 

The use of a V&I methodology did not decrease complex 
design errors. 
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MEASURE 5- EARLY DISCOVERY OF FAULTS 
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Viewgraph 8; Measure 5 - Early Discovery of Faults 

This viewgraph shows the percentage of errors of the total 
that were found after acceptance testing started. 

Expectation ; 

We expect to see a 50-percent reduction in the percentage of 
errors found after acceptance testing starts. 

Findings ; 

You can see that for the two V&I projects there was a slight 
decrease (less than 30 percent) in the percentage of errors 
found after acceptance testing started. 


■ Conclusion : 

The use of a V&I methodology did not sigificantly increase 
the early discovery of errors. 


Additional Data: 


The percentage of errors 

found in each 

phase is 

as follows: 

Phase 

Past 1 

Past 2 

V&I 1 

V&I 

After Acceptance Testing 
Started 

18.2 

23. o’ 

15.6 

17 .5 

Before Acceptance Testing 
Started 

81.8 

77.0 

84.4 

82.5 

Maintenance and Operation 

3.4 , 

5.3 

5.0 

6.9 

Acceptance Testing 

- 14.8 

17.7 

10 .6 

I0v6 

System Testing 

14.8 

4.8 

8.2 

18.9 

Code/Unit Testing 

67.0 

72.2 

76.2 

63.6 


This viewgraph and viewgraphs 9 through 11 contain M&O data 
through November 20, 1981. The length and status of the M&O 
phases are as follows: 


M&O Phase 

Months 

Status 


Past 1 
25 

Active 


Past 2 


14 

Complete 


V&I 1 
5 

Active 


V&I 2 
6 

Active 
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Except for project Past 2, which has ended, the results pre- 
sented in viewgraphs 8 through 11 can only become worse with 
further operation. However, the results are not expected to 
change appreciably because of the characteristics of the 
environment. Typically, in this environment, 95 to 100 per- 
cent of the postacceptance error corrections and enhance- 
ments occur during the first 6 months of M&O. For example, 
the supposedly last-planned modification of the source code 
for both V&I projects occurred a few days before 
November 20, 1981. 

After the first 6 months of M&O, typically, the software is 
changed only to support a degradation in satellite hardware 
performance, e.g., failure of a primary sensor. However, to 
support a launch, the software is engineered to support 
these types of contingencies but not always accurately 
enough for day-to-day operation. Since the usual lifetimes 
of these projects range from 1 to 3 years, the users must 
weigh the cost of extensive development to support serious 
or critical degradation in satellite hardware performance 
with the benefit to be gained during the expected (and usu- 
ally shortened) life of the satellite. For example, about a 
year ago, the satellite of project Past 1 (25 months M&O) 
had a critical hardware failure that seemed to end the proj- 
ect prematurely; however, relatively simple modifications to 
the software allowed the users to keep the satellite active 
in a degraded mode of operation. 
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MEASURE 3-COST OF CORRECTING FLAWS 
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Viewgraph 9; Measure 3 - Cost of Correcting Flaws 

This viewgraph shows the relative cost of . cor recting errors 
found after acceptance testing started. This number is the 
ratio of the fraction of effort required to correct the er- 
rors that occurred after acceptance testing started to the 
fraction of errors that occurred after acceptance testing 
started. For example, if 50 percent of the effort to cor- 
rect errors was expended after acceptance testing started 
and if that effort was needed to correct 5 percent of the 
errors, this number would be 10. 

Expectation ; 

We expect to see a 20- to 25-percent lower relative cost to 
correct errors after acceptance testing starts. 

Findings ; 

From the bar graphs, you can see that the relative cost to 
correct errors after acceptance testing started was the same 
as that for the past projfects. The relative cost to correct 
errors before acceptance testing started was approximately 
0.5. This indicates that the cost to correct errors after 
acceptance testing started was 'between 4.4 and 4.9 times 
more costly than the cost to correct errors before accept- 
ance testing started. 

Conclusion ; 

The use of a V&I methodology did not decrease the cost of 
correcting errors in the acceptance testing and M&O phases 
combined. 
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Additional Data: 


The relative cost of correcting errors in each phase is as 
follows: 


Phase 

Past 1 

Past 2 

V&I a 

V&I 

After Acceptance Testing 
Started 

2.78 

2.76 

2.88 

2.76 

Before Acceptance Testing 
Started 

0.60 

0.47 

0.59 

0.63 

Maintenance and Operation 

4.85 

4 .53 

4 .09 

3.54 

Acceptance Testing 

2.31 

2.23 

2.31 

2.26 

System Testing 

1.00 

1.09 

1.30 

1.08 

Code/Unit Testing 

0.47 

0.43 

0.58 

0.49 


These figures, in part, validate the common belief (advanced 
JDy proponents of V&I methodology) that errors are more ex- 
pensive to correct when they are discovered later in the 
development cycle. You can also see from these figures and 
from the figures in the previous viewgraph that the results 
are different for different phases; but, remember that we do 
not have responsibility for tne maintenance phase, and data 
is mote difficult to obtain from the group who has responsi- 
oility. Therefore, we measure things one or two phases ear- 
lier, i.e., during acceptance testing or system testing. 

The relative cost of correcting errors in the M&O phase was 
less for the V&I projects mainly because of '“fewer require- 
ments errors in that phase. The past projects had at least 
twice as many requirements errors in that phase. 
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Viewgraph 10: Measure 4 - Cost of System and Acceptance 

Testing 

This viewgraph shows the cost for time spent in various de- 
velopment calendar phases (not activity phases) . Design 
activity takes place in the design calendar phase, in the 
code/unit testing (implementation) calendar phase, and even 
in the system and acceptance testing calendar phase. De- 
tailed SEL data shows that design activity ranges from 30 to 
45 percent of the development effort. On the average, how- 
ever, only 23 percent of the development effort occurs dur- 
ing the design calendar phase, i.e., the phase in which only 
design- related activity is performed. The remaining design 
activity is performed primarily during the implementation 
phase because requirements change, previously missing infor- 
mation is acquired, and design errors exist. Since it is 
not unusual to receive requirements changes during the sys- 
tem and acceptance testing phases, since some previously 
missing information may be acquired during these phases, and 
since design .errors are also discovered in these phases, 
some design activity occurs here, too. 

This viewgraph also contains the average cost for each phase 
and the highest and lowest cost for each phase for the 
12 projects in our sample. The high or low costs themselves 
do not represent the cost of orie project but show the most 
and least money spent for the various phases by any of the 
12 projects. 

Expectation ; 

We expect to see a reduction in the cost of the system test- 
ing and acceptance testing phases. 

Findinas: 

— . ^ - 

On the average, we spend 29 percent of our dollars on system 
and acceptance testing. You can see that one V&I project 
was below the average (26.6 percent) and the other, above 
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(31.1 percent) . Together, they were equal to the average. 
Both were less than our two projects from the past. 

Conclusion ; 

The use of a V&I methodology did not significantly decrease 
the cost of system and acceptance testing. 

Additional Data ; 

We do not have responsibility for the maintenance phase. 

Our best estimate is that the maintenance costs for the faur 
projects are about 15 percent of the development costs. The 
V&I projects had approximately 16- to 18-percent overheads . 
to pay for the V&I effort. 
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MEASURE 6-~QUALITY OF SOFTWARE 
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Viewqraph 11; Measure 6 - Quality of Software 


This viewgraph shows the errors per thousand lines of devel- 
oped code for various calendar phases. What is important 
here is the M&O phase. 

Expectation ; 

We expect to see an error rate in the M&O phase less than 
the average error rate for this application. 

Findings ; 

From the bar graphs, you can see that the error rates for 
the two V&I projects are not better than the error rates for 
the two past projects. The average error rate in the M&O 
phase is between 0.5 and 0.6 errors per thousand lines of 
developed code; both V&I projects had error rates higher 
than the average. 

Conclusion ; 

The use of a V&I methodology did not improve the quality of 
the software put into operation. 

Additional Data ; 

Error rates from the other phases are important track rec- 
ords. Hypothetically, let us say that projects Past 1 and 
V&I 2 were developing the same product. If we measured the 
acceptance testing error rates, we would see that both had 
error rates of 1.4 errors per thousand lines of developed 
code. We would not be able to tell too much about the proj- 
ects from that viewpoint. However, if we examined those 
projects' error rates before acceptance testing, we would 
see that project Past 1 had a preacceptance testing error 
rate of 7.9 and project V&I 2 had a preacceptance testing 
error rate of 10.6. From this, we may be able to predict 
the worse M&O phase error rate for project V&i 2. 
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Viewgraph 12; 


Measure 7 - Productivity/Cost 


This viewgraph shows the cost (in staff -months) per thousand 
lines of developed code (K DLOC) . 

Expectation ; 

We expect the V&I overhead costs to be an add-on cost to our 
average development cost. 

F i nd i ng s ; 

Because of the interaction with the V&I team and some other 
problems, we drove the productivity of the development teams 
to the low end of our productivity range. Together, the two 
V&I projects were about 85 percent more expensive than our 
two past projects. Since the quality of the products was 
not any better (see viewgraph 11) , an 85-percent increase in 
cost for the same product is a very expensive penalty to 
pay. The cost of the development part of ‘the V&I projects 
(2.2 staff-months per K DLOC) was approximately 30 percent 
higher than the average development cost (1.7 staff-months 
per K DLOC) . This is three times as large as the estimated 
cost of interaction with the V&I team. 

Conclusion ; 

The use of a V&I methodology is expensive. 
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RESULTS OF V&l EXPERIMENT 

,1 

FROM THE DATA WE HAVE USED, WE HAVE 

FOUND MEASURE 

LARGE DECREASE IN REQUIREMENTS AMBIGUITIES 

AND MISINTERPRETATIONS 

i 

NO DECREASE IN DESIGN FLAWS 

NO DECREASE IN COST OF CORRECTING FLAWS 

SMALL DECREASE IN COST OF SYSTEM AND 

ACCEPTANCE TESTING 

I 

SMALL INCREASE IN EARLY DISCOVERY OF FAULTS 

NO INCREASE IN QUALITY OF SOFTWARE PUT INTO 

OPERATION 

LARGE DECREASE IN PRODUCTIVITY 
INCREASE IN COST 

SCORE: 1 PLUS; 5 ZEROS; 1 DOUBLE MINUS 


174-SEL-(33a)-13 



Viewgraph 13; Results of V&I Experiment 

From the data we have used, which includes resource data, 
error data, and the software, we have found that a V&I meth- 
odology provided 

1. A large decrease in requirements ambiguities and 
misinterpretations . There were very few late surprises in 
terms of requirements problems, and the number of require- 
ments errors reported was significantly less than for the 
past projects. 

2. No decrease in design errors . The fraction of com- 
plex design , error s was similar to that of the past projects. 

3. No decrease in the cost of correcting errors . The 
relative cost of correcting errors that occurred after ac- 
ceptance testing started was the same as that for the past 
projects. 

4 . A small decrease in the cost of system and accept- 
ance testing . One V&I project had a system and acceptance 
testing cost less than the average system and acceptance 
testing cost; the other V&I project was above the average 
cost. However, both V&I projects -had costs below the costs 
of the past projects used in the comparison. 

5. A small increase in early discovery of errors . For 
both V&I projects, the percentage of errors that occurred 
after acceptance testing started was less than the percent- 
age of errors that occurred after acceptance testing started 
for the past projects. 

6 . No improvement in the quality of software put into 
operation . The error rates in the M&O phase for both V&I 
projects were higher than the average error rate for soft- 
ware put into operation for this class of application. 

7 . A decrease in productivity and an increase in 
cost . Because, in part, the interaction of the V&I and 
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development teams lowered productivity and because there was 
not a savings in correcting errors, the cost was high. 

We scored a plus with the first measure (requirements prob- 
lems) ; zero with the next five measures; and a double minus 
with the last measure (productivity/cost) . 
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SUMMARY 


FIRST APPLICATION OF V£rl IN THIS 
ENVIRONMENT 

- DID NOT IMPROVE PROCESS 

- WAS EXPENSIVE 

- WAS A MANAGEMENT HEADACHE 

HOWEVER, WITH VARIATIONS, WE WILL 
ENCOURAGE ITS USE FOR 

- THE RIGHT SIZE EFFORT 

- THE RIGHT RELIABILITY REQUIREMENT 



Viev?qraph 14; Summary 

For our first application of a V&I methodology in this en- 
vironment 

• V&I did not improve the process 

• V&I was very expensive 

• V&I was a management headache 

To qualify this, our experience with many methodologies has 
been as follows: 

• The first time a methodology is applied, mistakes 
are made (and we made many mistakes) , and many of 
the potential benefits or advantages of the method- 
ology are not realized. 

• The second time a methodology is applied, there is 
a tendency to overcompensate for the things that 
you did worst the first time, and the methodology 
still does not work as well as it potentially could 

• The third time a methodology is applied, you lower 
your expectations somewhat or modify them, and you 
home in on what is right for your environment. 

In general, development teams are at the bottom of the totem 
pole in this environment. Because they work in an opera- 
tions environment, they have low priority for accessing the 
machines. They have adversary relationships with the 
analysis/requirements team, the team that conducts accept- 
ance testing, the people who schedule computer time, the 
computer operators, the programmer assistance center, and 
the customer. The V&I team members, who are like a develop- 
ment team but do not design or implement, have the same ad- 
versaries. Placing a V&I team in this environment creates 
another adversary for both the development team and the 
development- like V&I team. The manager who monitors both 
teams (the customer) has twice as many complaints, computer 
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problems, priority decisions, schedule problems, cost prob- 
lems, reporting problems, and conflicts to deal with. The 
V&I experiment was a management headache. 

However, we believe that we know what changes are needed and 
how to moderate them to make the use of a V&I methodology 
more cost effective in this environment for 

• The right size effort 

• The right reliability requirement 

Most of our projects require 8+4 staff-years of effort. We 
believe that a V&I methodology will be cost effective in the 
10- to 12- staf f -year range and that cost savings will be 
achieved for larger efforts. All our completed projects 
have been for ground-based software, but we have started to 
develop some onboard (flight) prototype systems. For these 
systems, which have a more stringent reliability require- 
ment, we believe that a V&I methodology will be cost effec- 
tive for 5- to 6-staff-year efforts. In both these cases, 
we believe that a V&I effort of approximately 15 percent of 
the development effort is sufficient for our work. 



THE VIEWGRAPH MATERIALS 
for the 

J. PAGE PRESENTATION WERE 
INCORPORATED IN PAPER 
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EVALUATING SOFTWARE DEVELOPMENT CHARACTERISTICS: 

ASSESSMENT OF SOFTWARE MEASURES IN THE 
SOFTWARE ENGINEERING LABORATORY 

Victor R. Basil! 

University of Maryland 
College Park, MD 20742 

The purpose of this presentation is to discuss some of the work done 
on metrics in the Software Engineering Laboratory. To put things in per- 
spective, there are many factors that affect software quality and each of these 
factors has several criteria which define it. Metrics represent some sort 
of measurement as to whether or not we have achieved a particular criteria. 

For example, one factor that we would like the software to possess is relia- 
bility. One of the many criteria that goes to make up this generalized 
factor of reliability might be fault tolerance. One of the metrics that can 
be used to evaluate fault tolerance might be the number of crashes of the 
system. 

There are many views of metrics.. We can think of metrics as being 
subjective or objective. Subjective metrics normally do not involve any 
exact measurement; they tend to..be an estimate of extent to a^degree in the 
application of some technique or a classification or qualification of a 
problem or experience. Subjective metrics are usually done on a relative 
scale; e.g., they may be binary (yes or no), or discrete numbers (zero, 1, 

2, 3). Examples of subjective metrics would be a qualitative judgment on 
the use of Process Design Language or an evaluation of the experience of 
programmers in a particular application. 

Objective metrics, on the other hand, tend to be absolute measures 
taken on the product or process. For example, the time of development, 

V. Basili 
Utiiv. of MD 
1 of 24 



Che number of lines of code delivered, the productivity in lines of code 
per staff month, Che number of errors or changes associated with the project. 
The distinction between subjective and objective metrics is typically a 
little bit fuzzy. Very often we make a metric subjective because we don't 
know how to quantify it. 

Another characterization of metrics is as product or process metrics. 
Product metrics measure the developed product, such as the source code, the 
object code, or the documentation. Such metrics might be lines of code 
(objective metric) or readability of the source code (subjective metric). 
Process metrics tend to measure the process model used for developing the 
product. Metrics such as use of methodology (subjective metric) and effort 
and staff months (objective metric) are two metrics that measure the process. 

Another characterization is to think of metrics as being cost or quality 
metrics. It is clear that cost can be a quality metric. However, typically 
a goal in software development is to minimize cost and maximize quality. So 
for that reason we will consider these as separate views. Cost normally 
involves the expenditure of resources in dollars, which might include some 
capital investment, and this metric is usually normalized according to some 
value component. For example, we measure staff months or productivity in 
terms of dollars received for dollars spent, or output for dollars spent, or 
size per time slice. Quality metrics, on the other hand, measure some form 
of .the value of the product. For example, trying to measure the mean time 
to failure of the product, the ease of change, the correctness, or the 
number of errors remaining are all quality measures. 

Use of Metrics 

We use metrics in varying ways. We can use them to characterize, 
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evaluate, or predict. Almost all metrics fit in the characterizing category. 
In that sense, the metric helps to distinguish the product and process or 
environment. For example, we may categorize an environment by the use of 
a methodology, the number of externally-generated changes, or the size. This 
allows us to compare environments or products or processes. 

Not all characterizing metrics are evaluative. Metrics are considered 
evaluative if the metric correlates with or shows directly the quality of the 
process or the product. For example, the number of errors recorded during 
acceptance testing or the productivity involved in the development of a 
software project give us some way of evaluating whether the product has some 
reasonable reliability or the development is cost effective. 

The most powerful capability a metric can have is prediction; that is, 
the measure is estimable or calculable and is used to predict another 
measure. For example, estimating size as a predictor of effort is a way to 
use an estimable metric to predict some desired information. 

To demonstrate that a particular metric evaluates or predicts, requires 
some validation. Too often metrics are proposed in the literature which are 
meant to be evaluative or predicted, but that capability is not established 
by experiment or case study. 

Analyzing Objective Metrics in the Software Engineering Laboratory 

In a paper presented at the Sigmetrics Workshop (Basili/Phillips) , we 

tried to use the laboratory project data to study the relationship between 

various metrics of size and complexity. One of the questions raised was 

could we predict effort, which was a cost measure, and the number of errors, 

a quality metric, using the various size and complexity metrics that appear 

in the literature. A second question was to be able to check the internal 

V. BasiU 
Univ. of MD 
3 of 24 



consistency of several of those size and complexity metrics. The metrics 
used are given in Table 1. The relationship between the various complexity 
metrics appears in Table 2, which gives the Pearson correlation coefficient. 
As can be seen from this table, several of the complexity and size metrics 

OBJECTIVE SIZE AND COfIPLEXITY liEASURES STUDIED 

SRC : SOURCE LINES OF CODE INCLUDING COKNTS 

XQT : EXECUTABLE STATEMENTS 

SOFTWARE SCIENCE METRICS 

N : LENGTH IN OPERATORS AND OPERANDS 

V : VOLUME 

V* : POTENTIAL VOLUME 

L : LEVEL 

E : EFFORT 

CYC : CYCLOMATIC COMPLEXITY 

CLS ; NUMBER OF CALL STATEMENTS 

CAJ : CALLS AND JUMPS 

CH6 : CHANGES TO THE SOURCE CODE 

REV ; NUMBER OF REVISIONS (VERSIONS) IN THE LIBRARY 

EFF : NUMBER OF HOURS EXPENDED IN DEVELOPMENT 

ERR : NUMBER OF ERRORS ASSOCIATED WITH COMPONENT 


Table 1 
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RELATIONSHIP BETWEEN SIZE AND COMPLEXITY METRICS 



REV 

CHG 

XQT 

SRC 

CAJ 

CYC 

CLS 

E . 

.6750 

.2407 

’ .8390 

.8706 

.8742 

.8906 

.7966 

CLS 

■ 6A27 

.3579 

^ .7594 

.8186 

.9648 

.8651 


CYC 

.7921 

.2534 

.9253 

.9519 

.9666 



CAJ 

.7A39 

.3158 

.8734 

.9176 




SRC 

.8415 

.2942 

.9896 





XQT 

. 8560 

.2920 






CH6 

.4229 


; 






Table 2 



correlate well with one another. On the other hand, the change metrics do 
not correlate well. In trying to use combinations of these metrics to predict 
effort and errors, we see by Table 3 that there is some success in accounting 
for effort with some of the metrics, but less success in accounting for errors. 


PREDICTING EFFORT AND ERRORS USING 
SIZE AND COMPLEXITY METRICS 



EFF 

ERR 

EFF 


.6396 

as 

.7977 

.5709 

CYC 

.7399 

.5592 

CAJ 

.7957 

.5898 

SRC 

.7583 

.5576 

XQT 

,7 m 

.5985 

REV 

.7122 

.6739 

E 

.6612 

.5932 

CH6 

.9799 



Table 3 
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Another study was to look at the internal validation of some of the 
metrics. Specifically, the software science metrics were examined to see 
whether predicted values for some of the metrics and actual values related in 
some way. Again, Pearson's correlation was usedj the results are given in 
Table A. One can see from this table that metrics like length, that is, 

N and N^do correlate. There is not a bad relationship between V and V*, 
although in the group of metrics^that relationship is probably the worst. 

It should be noted that projects are broken up into two groups — those of 
small components which were 50 lines or less, and large components which were 
more than 50 lines. 

Based on this study, we made the following conclusions: First of all, 

there does exist some relationship between complexity metrics and effort and 
errors. However, most of the complexity metrics do not do much better at 
estimation than lines of code or executable statements. On the other hand, 
many of the metrics related very well with each other, which seems to imply 
that they really are measuring the same thing. The goal, therefore, should 
be concentrated on looking at orthogonal metrics. We are currently investi- 
gating data metrics in the SEL. 

Using Subjective and Objective Metrics to Predict Cost 

In a paper presented at the 5th International Conference on Software 
Engineering (Bailey/Basili) , we inverted that experiment by examining the 
relationship between productivity and various factors. Basically, we used 
nonparametric statistics. The results were as follows; We found no signifi- 
cant relationship between productivity and size. However, there was a large 
set of methodology factors that showed varying degrees of positive correla- 
tion with productivity. A combined methodology factor that was used to pre- 
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INTERNAL VALIDATION 


SMALL COMPONENTS 


50 LINES 

(280) 

LARGE COMPONENTS 


50 LINES 

(285) 


LARGE 


SMALL 

n H 

.79 


.83 

V - V* 

.52 


.50 

A 

L^ L 

'.71 


.62 

A 

E ^ E 

.51 


.A2 


PEARSON CORRELATION 


Table k 
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diet cost or effort in the cost model showed a significant positive correla- 
tion with productivity as might have been expected. In this study, projects 
with high methodology rating were shown to have come from a different popula- 
tion than those with a low methodology rating. No other factor showed a 
significant positive correlation with productivity and we were able to show, 
at least in the SEL environment, that methodology does correlate with producti- 
vity and therefore has been an effective approach to software development. 

Using Subjective Metrics to Predict Quality 

Based on the study to predict productivity but changing the statistical 
approach to factor analysis, we compressed three sets of metrics into three 
f actors--quality , methodology, and complexity. Methodology ana complexity 
were not significantly correlated in the study. However, quality was sig- 
nificantly correlated with methodology with a correlation (R) of .67 and 
quality was also significantly correlated with complexity with a correlation 
(R) of -.64. In both cases, the correlation was less than a .001 significance 
level. 

Using methodology alone to predict quality, the coefficient of determina- 
2 

tion~(R ) is equal to .45. This means that methodology accounted for 

essentially 45% of the quality rating. Using methodology and complexity both, 

2 

we got an R of .65. This implies that there is some evidence that we can 
predict quality from methodology and complexity and that methodology is again 
highly correlated, not with just productivity as we saw in the previous study, 
but also with quality. Work in this particular area is just beginning and 
we plan to make tremendous use of the subjective metrics, not just for 
evaluation, but also for prediction. 
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VIEWS OF METRICS 


SUBJECTIVE VS. OBJECTIVE 


SUBJECTIVE: 

NO EXACT MEASUREMENT 

AN ESTIMATE OF EXTENT OR DEGREE IN THE APPLICATION 
OF SOME TECHNIQUE 

A CLASSIFICATION OR QUALIFICATION OF PROBLEM OR 
EXPERIENCE 

USUALLY DONE ON A RELATIVE SCALE 
E.6.. USE OF A PDL 

EXPERIENCE OF THE PROGRAMMERS IN THE APPLICATION 
OBJECTIVE: 

AN ABSOLUTE MEASURE TAKEN ON THE PRODUCT OR PROCESS 

E.Gt; time FOR'DEVELOPMENT 

NUMBER OF LINES OF CODE 
PRODUCTIVITY 

NUMBER OF ERRORS OR CHANGES 
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VIEWS OF METRICS 


PRODUCT VS. PROCESS 

PRODUCT: 

MEASURE OF THE ACTUAL DEVELOPED PRODUCT 
I.E.. SOURCE CODE. OBJECT CODE. DOCUMENTATION 
E.G.. LINES OF CODE. READABILITY OF THE SOURCE CODE 

PROCESS: 

MEASURE OF THE PROCESS MODEL USED FOR DEVELOPING 
THE PRODUCT 

E.G.. USE OF METHODOLOGY. EFFORT IN STAFF MONTHS 


COST VS. QUALITY 

COST: 

EXPENDITURE OF RESOURCES IN DOLLARS INCLUDING 

CAPITAL INVESTMENT USUALLY NORMALIZED ACCORDING 
TO SOME VALUE COMPONENT 

E.G.. STAFF MONTHS. PRODUCTIVITY. SIZE/TIME SLICE 


QUALITY: 

SOME FORM OF VALUE OF THE PRODUCT 
E.G.. RELIABILITY. EASE OF CHANGE. CORRECTNESS. 
NUMBER OF ERRORS REMAINING 
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USE OF METRICS 


PREDICTIVE VS. EVALUATIVE VS. CHARACTERIZING 
CHARACTERIZING: 

MEASURE HELPS DISTINGUISH THE PRODUCT OR PROCESS 
OR ENVIRONMENT 

E.G.. USE OF A METHODOLOGY. NUMBER OF EXTERNALLY 
GENERATED CHANGES. SIZE 

EVALUATIVE: 

MEASURE CORRELATES WITH OR SHOi/S DIRECTLY THE QUALITY 
OF THE PROCESS OR PRODUCT 
E.G.. NU.MBER OF ERRORS REPORTED DURING ACCEPTANCE 
TESTING. PRODUCTIVITY 

PREDICT I VEj^ 

MEASURE IS EST I MATABLE OR CALCULABLE AND IS DSED TO 
PREDICT ANOTHER MEASURE 

E.G.. ESTIMATING SIZE AS A PREDICTOR OF EFFORT 
USE REQUIRES VALIDATION 
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ANALYZING OBJECTIVE MEASURES 
IN THE SEL 

USING SEL PROJECT DATA TO STUDY THE RELATIONSHIP BETWEEN 
VARIOUS METRICS OF SIZE AND COMPLEXITY 

PREDICTING EFFORT (A COST MEASURE) AND NUMBER OF ERRORS 
(A QUALITY METRIC) USING SIZE AND COMPLEXITY METRICS 

CHECKING THE INTERNAL CONSISTENCY OF SEVERAL SIZE AND 
COMPLEXITY METRICS 
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OBJECTIVE SIZE AND COflPLEXITY MEASURES STUDIED 

SRC : SOURCE LINES OF CODE INCLUDING COMMENTS 

XQT : EXECUTABLE STATEMENTS 

SOFTWARE SCIENCE METRICS 

N : LENGTH IN OPERATORS AND OPERANDS 

V : VOLUME 

V* : POTENTIAL VOLUME 

L : LEVEL 

E : EFFORT 

CYC : CYCLOMATIC COMPLEXITY 

CLS : NUMBER OF CALL STATEMENTS 

CAJ : CALLS AND JUMPS 

CHG : CHANGES TO THE SOURCE CODE 

REV : NUMBER OF REVISIONS (VERSIONS) IN THE LIBRARY 

EFF : NUMBER OF HOURS EXPENDED IN DEVELOPMENT 

ERR : NUMBER OF ERRORS ASSOCIATED WITH COMPONENT 
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PREDICTING EFFORT AND ERRORS USING 


SIZE AND COMPLEXITY METRICS 



EFF 

ERR 

EFF 


.6346 

CLS 

.7977 

.5704 

CYC 

. 7399 

.5592 

CAJ 

.7957 

.5848 

SRC 

.7583 

.5576 

XQT 

.7W 

.5485 

REV 

.7122 

.6734 

E 

.6612 

.5432 

CH6 

.4799 
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RELATIONSHIP BETWEEN SIZE AND COMPLEXITY METRICS 



REV 

CH6 

XQT 

SRC 

CAJ 

CYC 

CLS 

E 

.6750 

.2407 

.8390 

.8706 

.8742 

.8906 

. 7966 

CLS 

.6427 

.3579 

.7594 

.8186 

.9648 

.8651 


CYC 

.7921 

.2534 

.9253 

.S519 

.9666 



CAJ 

.7439 

.3158 

.8734 

.9176 




SRC 

.8415 

.2942 

.9896 





XQT 

.8560 

.2920 






CH6 

.4229 









INTERNAL VALIDATION 


SMALL COMPONENTS 


50 LINES 

(280) 

LARGE COMPONENTS 


50 LINES 

(285) 


LARGE 


SMALL 

A 

N - N 

.79 


.83 

V - V* 

•52 


.50 

A 

L-- L 

.71 


.62 

A 

E ^ E 

.61 




PEARSON CORRELATION 
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CONCLUSION 


• CAN USE COMMERCIALLY-OBTAINED DATA TO VALIDATE COMPLEXITY METRICS 

• VALIDITY CHECKS AND ACCURACY RATINGS ARE VITAL 

• THERE EXIST RELATIONSHIPS BETWEEN COMPLEXITY METRICS AND EFFORT 

AND ERROR COUNTS 

• THE BETTER THE DATA. THE BETTER THE RESULTS 

• DON'T DO MUCH BETTER THAN LINES OF CODE ON EXECUTABLE STATEMENTS 

• METRICS RELATE WELL WITH EACH OTHER 

(MEASURING THE SAME THING) 
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USING SUBJECTIVE AND OBJECTIVE METRICS 
TO PREDICT COST (EFFORT) 


A META-MODEL WAS DEVELOPED FOR DERIVING AN INDIVIDUALIZED 
COST MODEL FOR THE LOCAL ENVIRONMENT 

IT ASSUMES EACH ENVIRONMENT IS DIFFERENT AND IS CLASSIFIABLE 
BY A SET OF FACTORS (CAPTURED USING SUBJECTIVE METRICS) 

SOME FACTORS ARE CONSTANT ACROSS THE ENVIRONMENT AND ARE 
HIDDEN IN A BASIC SIZE/EFFORT EQUATION BASED UPON 
PAST HISTORY WITHIN THE ENVIRONMENT 

OTHER FACTORS CAUSE DIFFERENCES BETWEEN PROJECTS AND CAN BE 
USED TO EXPLAIN THE DIFFERENCE BETWEEN ACTUAL EFFORT 
AND EFFORT AS PREDICTED BY THE BASIC SIZE/EFFORT 
EQUATION 

CAN PREDICT COST (EFFORT) WITH THE USE OF SUBJECTIVE METRICS 
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EVALUATING THE EFFECT OF VARIOUS 
FACTORS ON PRODUCTIVITY 

WE EXAMINED THE RELATIONSHIP BETWEEN PRODUCTIVITY AND VARIOUS 
FACTORS 

FOUND NO SIGNIFICANT RELATIONSHIP BETWEEN PRODUCTIVITY AND SIZE 

A LARGE SET OF METHODOLOGY FACTORS SHOWED VARYING DEGREES OF 
POSITIVE CORRELATION WITH PRODUCTIVITY 

A COMBINED METHODOLOGY FACTOR SHOWED A SIGNIFICANT POSITIVE 
CORRELATION WITH PRODUCTIVITY 

[projects with HIGH METHODOLOGY RATING CAME FROM A DIFFERENT 
POPULATION THAN THOSE WITH A LOW METHODOLOGY RATING] 

NO OTHER FACTORS SHOWED A SIGNIFICANT POSITIVE CORRELATION 
WITH PRODUCTIVITY 

METHODOLOGY IS CORRELATED WITH PRODUCTIVITY 
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USING SUBJECTIVE METRICS TO PREDICT QUALITY 


WE COMPRESSED THREE SETS OF METRICS INTO THREE FACTORS: 
QUALITY. METHODOLOGY. AND COMPLEXITY 

METHODOLOGY AND COMPLEXITY WERE NOT SIGNIFICANTLY 
CORRELATED 

QUALITY WAS SIGNIFICANTLY CORRELATED WITH 

METHODOLOGY (R = .67) AND COMPLEXITY (R = -6A) 

AT LESS THAN .001 SIGNIFICANCE LEVEL 

USING METHODOLOGY ALONE TO PREDICT QUALITY. R^ = .45 

USING METHODOLOGY AND COMPLEXITY WE GET R^ = .65 

THERE IS EVIDENCE WE CAN PREDICT QUALITY FROM 
METHODOLOGY AND COMPLEXITY 

METHODOLOGY IS CORRELATED WITH QUALITY 
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SOFTWARE METRICS 


The Quantitative Impact of Four 
Factors on Work Rates Experienced During 
Software Development 


John E. Gaffney, Jr. 
Robert W. Judge 
IBM Corporation 
Federal Systems Division 
Manassas, Virginia 22110 


Abstract 

This paper describes a model of the software development process 
which is being used at the IBM, Federal Systems Division. The model 
considers the software development process to consist of a sequence of 
activities, such as "program design" and "module development" (or coding). 
A manpower estimate is made by multiplying code size by the rates (man 
months per thousand lines of code) for each of the activities relevant 
to the particular case'of interestand summing up the results. The 
effect of four objectively determinable factors (organization, software 
product type, computer type, and code type) on the productivity values 
for each of nine principal software development activities has been 
assessed. The analysis indicates that four factors can be identified 
which account for 39% of the observed productivity variation. 
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Software Cost Analysis By Work Components 


Software development costs may be estimated by considering each of 

the activities or work components that constitute a particular software 

development process. These components are the basis for a software 

( 2 ) 

engineering management model used by the Federal Systems Division of 
IBM. Sixteen work components have been identified from which the software 
organization or the engineering organization Involved in a software 
development project can structure its particular activities. Data on 9 
of them served as the basis for the work reported upon here. This 
information was based on experience at the IBM, Manassas, Virginia 
facility. These work components are: 

Software Requirements Definition - This work component includes the 
definition and/or analysis of functional, operational, and other software 
system requirements. 

Software Development Planning - This work component includes all tasks 
necessary to generate the plans necessary for the implementation of the 
software system. 

Functional Design - This work component covers the documentation of the 
functions the software must perform to meet the requirements imposed 
upon it. 

Program Design - This work component covers the documentation of the 
software system from an internal viewpoint. 

Module Development - This work component covers the tasks associated 
with the detailed design of the software modules and their coding and 
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test . 


Software Integration and Test (SWIT) - This work component covers the 
integration and testing of the software system and the analysis to 
determine if it meets the system requirements. 

SWIT Problem Analysis and Error Correction - This work component covers 
the analysis and correction of software problems uncovered during SWIT. 

System Test - This work component covers the hardware/software integration 
and test effort. 

Acceptance Test - This work component covers the demonstration to the 
customer that the software system satisfies the requirements imposed 
upon it. 

A cost estimate can be made by considering the nature of the particular 
software development job and the work components (such as program design, 
coding, etc.) that constitute it. Then, the labor (man months) for 
each component is estimated. The sum of these man month figures is the 
amount required for the given job. The labor for each work component 
is estimated as the product of the productivity rate (in man months per 
thousand source lines of code = MM/KSLOC) and the amount of source lines 
of code. Thus; 

Total labor (man months) = ^ Pe^ x S = SP^ 

i=l 

Where; n = number of work components 
Pe^ = work rate //i 

S = amount of source lines of code (=KSLOC) . 

The approach to considering the software development process as a 
sequence of activities with well-ordered time precedence relationships 
is a model long used by industrial engineers, and has been applied 
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(3 h) 

recently to modern electronic systems development. ’ Considering the 
development process in terms of its constituents enables the estimator 
to achieve a greater degree of intellectual control than if he were to 
evaluate the process overall. For example, it may not be clear how the 
availability of a new process that facilitates unit testing would impact 
overall development productivity. However, its effect on the work 
component that covers unit test would be much easier to discern. Then, 
the effect on overall productivity can be readily calculated by simply 
reviewing the appropriate rate (e.g. the proper "Pe^" in the equation 
given above) . 


The Impact of Four Factors on Work Component Productivities 

Earlier work has considered the effect on overall productivity of 
various factors relating to the complexity of the code to be developed, 
the skills of the software development work force, and other factors 
representative of the software devel.opment environment.^^' This 

paper provides a quantitative assessment of the impact of several 
significant factors on the work rates of 9 specific work components. 

A linear regression model was structured to relate the values of 
work rate in man months per KSLOC (MM/KSLOC) , experienced in a reasonably 
large number of cases (typically more than 30 data samples), to variables 
representative of the factors; organization, software product type, 
computer type, and code type ^involved in each case. The multiple correlation 
coefficient between the MM/KSLOC value and the encoded values of each of 
the variables was determined in each case. The square of this value 
times 100 is equal to the amount of variation in the given cost component 
'explainable' by these four variables. Table 1 tallies their percentages, 
together with the sample size for each of the 9 work components that 
were evaluated. 
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Table 1 - Percentage of Variation in Work Rate 
Explainable by Four Factors^^^ 


Work 

Component 

Percentage of Variation 
In Work Rate Explained 
By The Four Factors (1) 

Number of 
Samples Used 

Software Requirements 


15.12 

30 

Software Development 




Plan 


17.81 

38 

Functional Design 


15.53 

45 

Program Design 


38.43 

66 

Module Development 


55.87 

60 

Software Integration 




and Test (SWIT) 


46.90 

51 

SWIT-Problem Analysis 




and Error Correction 

60.33 

51 

System Test 


26.13 

39 

Acceptance Test 


49.40 

42 

Average 


36.17 

47 


(1); organization (2 alternatives); product types (2 alternatives); 
computer type (3 alternatives); code type (3 alternatives) 


Table 1 shows that, on a work component basis, the percentage of 
variation explained by the four factors is 36.17%. However, on an overall 
project basis, this percentage increases to 39% value. This is^because ' 
the percentage of variation explained is larger for those work components 
which represent a greater proportion of the overall software product 
development effort. 

Conclusion 


The methodology of 'bottom-up' or 'micro' software development cost 
estimation and analysis has been described. The definitions of the 
sixteen cost components used by the IBM Federal Systems Division were 
presented. The effects of knowledge of four factors in resolving the 
uncertainty of nine of these cost components were presented. 
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WORK RATE 

0 WORK RATE IS AN INDICATOR OF PRODUCTIVITY WHICH 
USES SOURCE LINES OF CODE (SLOC) AS THE MEASURABLE 

LABOR (MAN MONTHS) = WORK RATE (MM/SLOC) • W0RK(SL0C) 
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SOFTWARE WORK COMPONENTS 


0 SOFTWARE REQUIREMENTS DEFINITION 
0 SOFTWARE DEVELOPMENT PLANNING 
0 FUNCTIONAL DESIGN 
0 PROGRAM DESIGN 
0 MODULE DEVELOPMENT 
0 SOFTWARE INTEGRATION AND TEST 
0 PROBLEM ANALYSIS AND ERROR CORRECTION 
0 SYSTEM TEST 
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0 ACCEPTANCE TEST 



s 


ESTIMATION MFTHODQLOr,Y 

N 

TOTAL LABOR (MAN MONTHS) = ^ P, X S = M 

1 = 1 ^ 

WHERE; 

M = MAN MONTHS 

N = NUMBER OF WORK COMPONENTS 

Pp = WORK RATE #1 

S = NUMBER OF SOURCE LINES OF CODE 
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THE FOUR FACTORS 
WHOSE EFFECT WAS ANALYZED 


0 ORGANIZATION 
0 PRODUCT TYPE 
0 COMPUTER TYPE 
0 CODE TYPE 


(2 ALTERNATIVES) 
(2 ALTERNATIVES) 
(3 ALTERNATIVES) 
(3 ALTERNATIVES) 
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MP-i/KSLOC 

(SIMULATED) 
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PERCBTAGE OF VARIATION IN WORK RATE 
EXPLAINABLE BY FOUR FACTORS 


r 

WORK 

COMPONENT 

PERCENTAGE 'OF 
VARIATION IN 
WORK RATE EX- 
PLAINED BY THE 
FOUR FACTORS 

NUMBER OF 
SAMPLES USED 

SOFTWARE REQUIREMENTS 

15.12 

30 

SOFTWARE DEV. PLAN 

17.81 

38 

FUNCTIONAL DESIGN 

15.53 

i|5 

PROGRAM DESIGN 

38.il3 

66 

MODULE DEVELOPMENT 
SOFTWARE INTEGRATION 

55.87 

60 

AND TEST (SWIT) 
SWIT-PROBLEM ANALYSIS 

^6.90 

51 

AND ERROR CORRECTION 

60.33 

51 

SYSTEM TEST 

26.13 

39 

ACCEPTANCE TEST 

99.^0 

^12 

AVERAGE 

36.17 
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WEIGHTED AVERAGE 


39.00 





SilWARY 


0 DESCRIBED WORK COMPONENT APPROACH TO ESTIMATION 

0 ASSESSED IMPACT OF FOUR FACTORS ON WORK RATE 

0 DETERMINED THAT THESE FOUR FACTORS ACCOUNTED FOR 

39% OF THE VARIABILITY IN THE OVERALL WORK RATE 

0 EXPLAINED WHY THE RESULTS DEMONSTRATE THE POWER 
OF THE WORK COMPONENT APPROACH 
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SOFTWARE METRICS: 

SOFTWARE QUALITY METRICS FOR DISTRIBUTED SYSTEMS 

by 

Jonathan V. Post 
Boeing Aerospace Company 


ABSTRACT 

Recent publication of numerous books and papers indicates 
the growing importance of Software Quality Metrics [1]. Studies 
at the Boeing Aerospace Company 12,3] have extended this field to 
cover Distributed Computer Systems. Emphasis is placed on 
studying Embedded computer systems, and on viewing them within 
a system life cycle [4]. The approach of J. A. McCall, et.al. 
[5,6], at General Electric was pursued and extended, maintaining 
the hierarchy of quality factors, criteria, and metrics [fig.l]. 
New software quality factors have been added, including Sur- 
vivability, Expandability, and Evolvability [fig. 2]. 

- _ _key;wori)s _ 

Software, Quality, Metrics, Distributed, Survivability, Life Cy- 
cle, Expandability, Evolvability, Virtuality 

INTRODUCTION 

What is a distributed computer system? Enslow [7] requires 
such a system to meet five criteria, while LeLann [8] requires it 
to be a collection of entities participating in system perfor- 
mance. Mauchley and Eckert built the first distributed computer, 
BINAC, circa 1947 » and acknowledged [9] that the structure of the 
human brain, with its two cerebral hemispheres, was ,a guiding 
design metaphor. Dr. Roger Sperry's Nobel Prize in Medicine was 
for experiments performed at Caltech which established that the 
human brain is a distributed computer [10]. We consider a dis- 
tributed system to be formed by the interconnection of potential- 
ly autonomous systems to accomplish system functions cooperative- 
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ly . 


There are several ways the term "distributed" may be inter- 
preted. Data may be distributed, processors may be distributed, 
processes may be distributed, users may be distributed, communi- 
cations may link geographically dispersed clusters of components, 
or some combination of these strategies may be imposed on system 
architecture. Each of these types of distributedness leads to 
design tradeoffs, and to qualitative distinctions between cen- 
tralized and distributed systems. No single model allows 
analysis of all such tradeoffs; data is either specialized, anec- 
dotal, or condensed to "lessons learned" or scenario form. The 
application of Software Quality Metrics should help to provide a 
unifying framework for all such distributed systems. As Norber 
Weiner first emphasized [11], it is possible to build a reliable 
system out of unreliable parts. 

It will be increasingly important to understand distributed 
computer systems. Some of their characteristics will emerge more 
extensively in future configurations. One characteristic peculiar 
to distributed systems, and of importance in the^ 80's, is Geo- 
graphic Dispersion". The extent to which computers within a dis- 
tributed system can be physically displaced from each other, 
range from the centimeter to the mul t 1- t housand-ki 1 ome t e r . Com- 
puters will indeed be "tightly-coupled" over intercontinental 
distances by fiber-optics technology currently under research. 
This technology complements that of the communications satellite. 
Interconnection of even a very small percentage of available com- 
puters will be able to form distributed systems of complexity 
beyond those of today, since by 1999 there will be on the order 
of one billion computers in the world [13]. 

QUALITY METRICS APPROACH 

The approach chosen to evaluate distributed systems is the 
Software Quality Metrics methodology, which has been fruitfully 
applied to the study of a broad range of uniprocessor computers 
and embedded computer systems [1]. Since the 1970's, additional 
factors have been Judged necessary in evaluating the performance 
of software and systems besides that of classic Reliability which 
was a factor closely identified with software and system quality. 
McCall and others [5,6] identified eleven software quality fact- 
ors and developed a system of metrics to predict and assess the 
degree of presence of these factors. As shown in flg.l, each fa- 
ctor is composed of a number of criteria which are further broken 
down into quantitative metrics. The eleven Factors identified : 
Correctness, Reliability, Efficiency, Integrity, Usability, Main- 
tainability, Testability, Flexability, Portability, Reusability, 
and Interoperability. The extension of this approach to distrib- 
uted systems was introduced at last year's workshop by Robert W. 
Lawler of Boeing Aerospace Company [15]. The research conducted 
during the past year, as reported to RADC[2], has concentrated on 
identifying unique characteristics of distributed systems, and on 
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definition or redefinition of factors and criteria which can mea- 
sure these characteristics. Three new software factors, four new 
system factors, twelve new software criteria , and two new system 
criteria have been described, and the factor of Testability has 
been generalized into the factor of Verifiability. Examples of 
these new factors and criteria are described below and in fig. 2. 

DISTRIBUTED SYSTEM CHARACTERISTICS 

How do we approach the identification of the characteristics 
of distributed systems? Distributed System characteristics are 
identified and classified, along with rationales for the 
selection of Distributed Systems. 58 rationales are grouped into 
9 reasons in fig. 3 . The rationales given for selection of a 
distributed system over a uniprocessor system indicate the 
characteristics which people imagine distributed systems, as a 
whole, exhibit. No one system meets more than a fraction of 
these identifications, just as no system life cycle for a distri- 
buted system quite fits into the system life cycle models for 
uniprocessor systems. Instead, we find the distributed system to 
be distributed through time in a distributed life cycle of con- 
current phases of Operation, Revision, and Transition [fig. 4]. 

NEW QUALITY FACTORS 

The main difference between software metrics for a distri- 
buted system and software metrics for a uniprocessor system is 
that the quality of software in a distributed environment depends 
upon the design and performance characteristics of the entire 
system. We therefore distinguish between Software Quality Factors 
and System Quality Factors, although these have impact upon each 
other. The quality factor of Survivability, for example, re- 
flects system performance when one or more nodes or communication 
links become totally nonoper a t ional . The concepts of Reliability 
and Redundancy in a uniprocessor are not broad enough to describe 
Survivability . — — , 

Survivability is a factor which measures the capability of a 
system to operate when one or more components are destroyed. For 
a non- distributed system. Survivability is not a very meaning- 
ful measure. A single unit computer, depending on the degree of 
hardening and the damage received in the tactical environment, 
will usually either continue to operate, or else be completely 
incapacitated. For a geographically dispersed system, it is 
desirable that damage or destruction of individual components 
shall allow the system to continue functioning, albeit at a some- 
what lower level of performance. Survivability, then, might 
measure the likelihood of a distributed system to exhibit this 
"graceful degradation". The 5 criteria within the system quality 
factor of . Surv Ivab il i ty are Autonomy, Distributedness, Anomaly 
Management, Modularity, and Reconfigurability. (See fig. 2) 

Distributed Systems also require metrics to evaluate the capaci- 
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ty of expanding and upgrading the system, so we have identified 
and defined the corresponding factors of Expandability and 
Evolveab il ity . Expandability is the extent to which the system 
capability can be expanded to enhance current functions or to add 
new functions. The criteria within the factor of Expandability 
include; Virtuality, Generality, Modularity, Augment ab il i ty , 
Clarity, Specificity, and Simplicity. Evolvability is the extent 
to which the system performance could be enhanced by the incor- 
poration of new technology. Criteria within Evolvability are 
Virtuality, Generality, Modularity, Clarity, Specificity, and 
Simplicity. In addition, we have defined four new system quality 
factors. Availability, Safety, Transportability, and Interchange- 
ability. 


NEW CRITERIA 

Twelve new software criteria were identified during investi- 
gation of characteristics for distributed systems [2]. These 
criteria are: Compliance, Validity, Clarity, Specificity, Virtu- 
ality, Comprehensibility, Reconfigurability, Distributedness, Au- 
tonomy, Suppor tab il ity , Augmentab illty , and Compatibility 
[fig. 5]. In addition, two new system criteria were identified : 
Self-containedness (an attribute of Transportability) and Homo- 
geneity (an attribute of Interchangeability). A majority of these 
system and software criteria are applicable to uniprocessors as 
well. The following brief discussion on one of the new software 
criteria. Virtuality, shows how the entire system, including the 
human users, needs to be measured to evaluate the system quality. 

For Distributed Systems, there is a new criterion within the 
quality factor for Usability. We refer to this criterion as Vir- 
tuality. The structure of a distributed system can be quite com- 
plex, and it is not always desirable for the user to be appraised 
of this structure. The user may perceive the system in terms of 
a virtual architecture, and be shielded from knowing the actual 
internal representation and location of data. 

Virtuality is a measure of the extent to which the system 
appears to the user as it is intended to appear to the user. The 
user (or users) of a system is not expected or intended to see 
the system's logical, topological, or physical structure. In- 
stead, an abstract "virtual" system is designed. The "real" sys- 
tem supports, emulates, and embodies the designed appearance and 
"feel" of the virtual system. 

Theodor H. Nelson [12] explains the relationship between 
Virtuality and other criteria such as Conceptual Simplicity, 
Machine Independence, and Network File Availability. "Our ap- 
proach to computer design we call 'the design of virtuality.' By 
virtuality we mean the seeming of an object or system, its con- 
ceptual structure, its atmospherics and its feel.... What counts 
is effects, not techniques.... The design of an interactive com- 
puter environment, similarly, should not be based on particular 
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hardware, or a particular display device, or a programming tech- 
nique.... the systems analysis for an interactive system should 
deal with the mental space of the user's experience." 

Virtuality also measures the subjective component of the 
user interface. In the special case of flight training simula- 
tors [14], the "feel" of the system has long been regarded as 
crucial to Usability. "Feel" is evaluated by expert pilots (su- 
perusers). This goes beyond Human Engineering, which concen- 
trates on one display /sensory modality at a time, or on total 
bits per second. "Feel", and ther ef o re Vir t ual i ty , involves ges- 
talt perception, with an emphasis on right-brain holistic activi- 
ty. Virtuality, and the human brain, cannot be ignored when 
studying distributed systems. 

NEW METRICS 

During the next year of this research effort there will be a 
set of metrics developed within the criteria and factors discuss- 
ed above. The existing metrics [6] will be added to, deleted, and 
modified in accordance with results to date. The work yet to be 
performed may be summarized as follows: 

(1) Select Quality Metrics for Validation (Identify those metrics 
that will make the greatest contribution to validating the quali- 
ty measurement framework previously developed); 

(2) Develop Scenarios and Collect Data (Design the data collection 
methodologies and gather relevant data from Boeing Aerospace Com- 
pany projects which use distributed embedded computer systems); 

(3) Validate Metrics (Validation techniques consistent in concept 
and methodology with McCall, et.al. [6], but with multivariate 
regression analysis and other numerical analysis and correlation 
methods; conduct interviews with engineering and management 
personnel to supplement empirical data) ; 

(4) Produce a Report and Handbook (Final Report to be published 
by RADC. A Handbook will be prepared that describes the step- 
by-step procedures required to implement the quality meas- 
urements for distributed systems). 

SUMMARY 

Software Quality Metrics may be applied to the evaluation of 
distributed computer systems. Exactly what constitutes a distrib- 
uted system is disputed in the literature. They have been built 
in various configurations for thirty years, but the human brain 
shares some of the characteristics of these systems and provides 
a valuable model. The approach of McCall et.al., with factors, 
criteria, and metrics, has been extended. New factors and new 
criteria have been defined. New metrics will be devised and val- 
idated as the research described in this paper is continued. 
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USER-ORIENTED VIEW 
OF PRODUCT aUALITY 


SOFTWARE-ORIENTED 
ATTRIBUTES WHICH 
INDICATE aUALITY 


aUANTITATIVE MEASURES 
OF ATTRIBUTES 


Figure 1 Software Ouatity Model 
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Software Quality Factors 
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** = Different Figure 2 Relationship of Criteria to Software Buality Factors 
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Figure 3 Relationship Between Reasons, Rationales, and 
System Quality Factors (page 1 of 2) 
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REASON REASONS FOR SELECTION OF 

NO. DISTRIBUTED SYSTEMS 

6 IMPROVE THRUPUT 


•DISTRIBUTE JOBS TO SEVERAL NODES CONCURRENTLY 


•EXPLOITATION OF UNIFORM INTERCHANGE MEDIA 


•ENHANCED DATA PARALLELISM 


•ENHANCED COMPUTATIONAL PARALLELISM 


•OPTIMAL PARTITIONING OF WORKLOAD 


• REDUCE LOAD ON HOST 


•DISTRIBUTED OPERATING SYSTEM 

•ELIMINATE MULTIPROGRAMMING 


7 IMPROVE SURVIVABILITY 


• SECURITY ON HIERARCHICAL NETV/ORK 


•SYSTEM PROTECTION FROM OVERLOAD 


• BACKUP REDUNDANCY 


• RESTORATION/RECOVERY 


• ENDURANCE/HARDENING 


8 IMPROVE SENSOR PERFORMANCE 


• DISTRIBUTED SENSORS 


• DISTRIBUTED EFFECTORS 


• INTELLIGENT SENSOR CLUSTERS 


• DEPLOYABLE SENSOR ARRAYS 


•CONCURRENT MULT I -SPECTRAL SCANNING 


9 IMPROVE GEOGRAPHIC DISPERSION 


• USER DISTRIBUTION 


•GATEWAY TO NAT I ONA L/ I NTERNAT I 0 NAL NETWORK 


•GLOBAL C3I APPLICATIONS 


•SPACE SYSTEMS NETWORKS _ 


•NEED FOR MOBILE NODES 


• NEED FOR DISTRIBUTED DATABASE MANAGEMENT 


• ADAPTIVE ROUTING 
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Figure 3 Relationship Between Reasons, Rationales, and 
System Quality Factors (page 2 of 2) 
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ACTIVITY 


USER CONCERN 


aUALITY FACTOR 


PRODUCT 

OPERATION 


PRODUCT 

REVISION 


PRODUCT 

TRANSITION 


* = NEW 


DOES IT DO WHAT IT’S SUPPOSED TOT 


WHAT CONFIDENCE CAN BE PLACED IN 
WHAT IT DOES? 


HOW WELL DOES IT UTILIZE THE 
RESOURCES? 

HOW SECURE IS IT? 


HOW EASY IS IT TO USE? 

HOW WELL WILL IT PERFORM UNDER 
ADVERSE CONDITIONS? 

CAN IT BE REPAIRED? 

CAN ITS OPERATION AND PERFORMANCE 
BE VERIFIED? 

CAN IT BE CHANGED? 


CAN IT BE USED IN ANOTHER 
ENVIRONMENT? 

CAN If BE USED IN ANOTHER” 
APPLICATION? 

CAN IT BE INTERFACED WITH ANOTHER 
SYSTEM? 


CAN ITS CAPABILITY BE EXPANDED? 


CAN ITS PERFORMANCE BE UPGRADED 
WITH NEW TECHNOLOGY? 


OR DIFFERENT 

Figure 4 Quality Life Cycle Scheme 


CORRECTNESS 

RELIABILITY 

EFFICIENCY 

INTEGRITY 

USABILITY 

SURVIVABILITY* 

MAINTAINABILITY 

VERIFIABILITY* 

FLEXIBILITY 

PORTABILITY 

REUSABILITY 

INTEROPERABILITY 

EXPANDABILITY * 

EVOLVABILITY* 
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Figure 5 Software Quality Criteria Definitions 
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CRITERION 

DEFINITION 

•TRAINING 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE TRANSITION FROM CURRENT 
OPERATION OR PROVIDE INITIAL FAMILIARIZATION. 

•COMMUNICATIVENESS 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE USEFUL INPUTS AND OUTPUTS 
WHICH. CAN BE ASSIMILATED. 

• OPERABILITY 

•THOSE ATTRIBUTES OF THE SOFTWARE, WHICH DETERMINE OPERATIONS AND 
PROCEDURES CONCERNED WITH THE OPERATION OF THE SOFTWARE. 

• MODULARITY 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE A STRUCTURE OF HIGHLY 
COHESIVE MODULES WITH OPTIMUM COUPLING. 

• RECONFIGURABILITY* 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FOR CONTINUITY OF SYSTEM 
OPERATION WHEN ONE OR MORE PROCESSORS* STORAGE UNITS* OR COMMUNICATIONS 
LINKS FAIL. 

• DISTRIBUTEDNESS* 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH DETERMINE THE DEGREE TO WHICH 
SOFTWARE FUNCTIONS ARE GEOGRAPHICALLY OR LOGICALLY SEPARATED WITHIN THE 
SYSTEM. 

•AUTONOMY* 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH DETERMINE ITS DEPENDENCY ON 
INTERFACES. 

•CONCISENESS 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FOR IMPLEMENTATION OF A 
FUNCTION WITH A MINIMUM AMOUNT OF CODE. 

• SUPPORTABILITY* 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FOR EASE IN CREATION OF NEW 
SOFTWARE VERSIONS (e.fi.* USE OF HOL. VERSION UPDATE SCHEME). 

• SELF-DESCRIPTIVENESS 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE EXPLANATION OF THE 
IMPLEMENTATION OF A FUNCTION. 

•GENERALITY 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE BREADTH TO THE FUNCTIONS 
PERFORMED. 

• INDEPENDENCE** 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH DETERMINE ITS DEPENDENCY ON THE 
SOFTWARE ENVIRONMENT (COMPUTING SYSTEM* OPERATING SYSTEM. UTILITIES. 
INPUT/OUTPUT ROUTINES. LIBRARIES). 

• AUGMENTABILITY* 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE EXPANSION CAPABILITY FOR 
FUNCTIONS AND DATA. 

•COMPATIBILITY* 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE INTERFACE PROTOCOLS AND 
ROUTINES THAT ARE APPROPRIATE TO THE INTERFACE EftUIPMENT FEATURES AND 
CAPABILITIES. 

• COMMONALITY** 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FOR THE USE OF INTERFACE 
STANDARDS FOR PROTOCOLS. ROUTINES. AND DATA REPRESENTATIONS. 


* = New 
** = Different 

























Figure 5 Software Quality Criteria Definitions * = New 


CRITERION 


DEFINITION 


C/l 


•TRACEABILITY 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE A THREAD OF ORIGIN FROM THE 
IMPLEMENTATION TO THE REOUIREMENTS WITH RESPECT TO THE SPECIFIED 
DEVELOPMENT ENVELOPE AND OPERATIONAL ENVIRONMENT. 

•CONSISTENCY 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FOR UNIFORM DESIGN AND 
IMPLEMENTATION TECHNIOUES AND NOTATION, 

• COMPLETENESS 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FULL IMPLEMENTATION OF THE 
FUNCTIONS REGUIRED. 

• COMPLIANCE* 

• THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROMOTE IMPLEMENTATIONS THAT 
CONFORM TO THE RERUIREMENTS. 

• VALIDITY* 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH CONSTRAIN IMPLEMENTATIONS TO A 
RANGE OF ACCEPTABLE SOLUTIONS. 

• CLARITY* 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE NON-AMBI GUOUS DESCRIPTIONS 
OF FUNCTIONS AND IMPLEMENTATIONS. 

• SPECIFICITY* 

• THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE SINGULARITY IN THE 
DEFINITION AND IMPLEMENTATION OF FUNCTIONS. 

• SIMPLICITY 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FOR THE DEFINITION AND 
IMPLEMENTATION OF FUNCTIONS IN THE MOST NON-COMPLEX AND UNDERSTANDABLE 
MANNER. 

• ANOMALY 

MANAGEMENT** 

• THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FOR CONTINUITY OF 
OPERATIONS UNDER AND RECOVERY FROM NON-NOMINAL CONDITIONS. 

• ACCURACY 

• THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE THE REGUIRED PRECISION IN 
CALCULATIONS AND OUTPUTS. 

• EFFECTIVENESS** 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FOR MINIMUM UTILIZATION OF 
RESOURCES (PROCESSING TIMEi STORAGEi OPERATOR TIME) IN PERFORMING 
FUNCTIONS. 

• ACCESSIBILITY** 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE FOR CONTROL AND AUDIT OF 
ACCESS TO THE SOFTWARE AND DATA. 

• VIRTUALITY* 

• THOSE ATTRIBUTES OF THE SOFTWARE WHICH PRESENT A SYSTEM THAT DOES NOT 
REOUIRE USER KNOWLEDGE OF THE PHYSICAL CHARACTERISTICS (a.e.. NUMBER OF 
PROCESSORS/DISKSi STORAGE LOCATIONS) 

• VISIBILITY** 

•THOSE ATTRIBUTES OF THE SOFTWARE WHICH PROVIDE STATUS MONITORING OF THE 
DEVELOPMENT AND OPERATION (e.g.i INSTRUMENTATION). 

• COMPREHENSIBILITY* 

• THOSE ATTRIBUTES OF THE SOFTWARE WHICH ENHANCE UNDERSTANDING OF THE 
OPERATION OF THE SOFTWARE. 
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INTRODUCTION 


The purpose of this presentation is to describe and demon- 
strate a large-scale, systematic procedure for identifying 
and evaluating measures that meaningfully characterize one 
or more elements of software development. The background of 
this research, the nature of the data involved, and the 
steps of the analytic procedure are discussed. The presen- 
tation concludes with an example of the application of this 
procedure to data from real software development projects. 

AS the term is used here, a measure is a count or numerical 
rating of the occurrence of some property. Examples of 
measures include lines of code, number of computer -runs, 
person-hours expended, and degree of use of top-down design 
methodology. Measures appeal to the researcher and the man- 
ager as a potential means of defining, explaining, and pre- 
dicting software development qualities, especially 
productivity and reliability. 

Measures may be classified into four groups as illustrated 
by the software development model presented in Figure 1. It 
shows these components: a problem, a solution-generating 

process, the environment in which that process takes place, 
and the solution (or software product) . Measures can be 
employed to characterize the components of this model and to 
show their interrelationships. Some examples of appropriate 
measures for each component are also shown in the figure. 

The Goddard Space Flight Center (GSFC) Software Engineering 
Laboratory (SEL) is engaged in an effort, part of which this 
presentation describes, to develop a concise set of such 
characteristic measures. The SEL and its activities are 
discussed in more detail in Reference 1. 
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COMPUTER 



DOCUMENTATION PROCEDURES 

Figure 1. SEL Software Development Model 
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The approach to software measurement adopted in this presen- 
tation is different from that generally followed. The usual 
procedure is to select high-level "qualities" and then to 
seek numerical criteria or measures of these qualities. 
McCall (Reference 2) has developed a comprehensive system of 
such qualities and appropriate measures. However, the goal 
of the approach followed here is to identify the qualities 
being measured by the data collected rather than to attempt 
to associate measures with previously specified qualities. 
The measures considered in this analysis are described in 
the next section. 
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DATA DESCRIPTION 


Clearly, the number of potentially useful measures is large; 
the SEL has selected more than 200 for study. These meas- 
ures cover the entire range of software development activity 
as experienced by the SEL. However, the analysis described 
here will focus on the relationships among measures of the 
process and product components of the software development 
model (see Figure 1) . 

Therefore, a data subset containing only the 60 measures 
relevant to those two components was used. The measures (or 
variables) used are listed in Table 1 (see Appendix A) . 

This list does not necessarily exhaust the possibilities for 
measures in those areas; however, this group of measures is 
believed to form a comprehensive set. The process measures 
class is represented by three subclasses: methodology 

(Table la) , tools (Table lb) , and documentation (Table Ic) . 
Note that the methodology class is further subdivided by 
development phase into design, code, and test measures. The 
product class (Table Id) includes size and resource measures. 

The data used in this analysis were collected by the SEL 
from 22 actual medium-scale, scientific software development 
projects. Values for all these measures were determined for 
each project. The values are ratings of the degree of use, 
counts, or rates per line of code, as indicated in Table 1. 
Degr ee-of-use process measures are expressed as relative 
scores on a scale from zero to five. The exact derivation 
of these scores will be explained in a forthcoming SEL docu- 
ment (Reference 3) . 
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ANALYTIC PROCEDURE 


The 60 measures just described are not unique or inde- 
pendent. Some may, in fact, measure the same or related 
qualities. The object of the analytic procedure is to 
identify the most basic set of qualities (or properties) 
being measured by the group of 60. A "basic" quality is 
defined to be one that is independent of all other such 
qualities. This subset, then, defines the basic quality 
characteristics describing the projects from which the data 
were obtained. 

The procedure to be proposed is "large scale." That is, it 
is appropriate when a large number of measures (or vari- 
ables) are to be evaluated. The researcher interested in 
studying the relationships of only a few specific measures 
can probably get better results from regression and hypoth- 
esis testing techniques. Nevertheless, this procedure can 
be useful as a screening tool for detecting confounding ef- 
fects in the data before selecting other statistical tech- 
niques. 

The analytic procedure followed in this experiment has two 
steps, as indicated in Figure 2. These are the application 
-Of a, test of normality to the candidate measures (data.) 
followed by a factor analysis of those not rejected by the 
test. The result of this procedure is a descriptive, rather 
than a predictive, model of the data. The procedure iden- 
tifies the descriptive factors common to the set of meas- 
ures. Thus, the original measures are organized into a 
number of groups (or factors) smaller than the number of 
measures input to the procedure. These factors correspond 
to the basic qualities sought for in the data. The steps of 
this procedure are discussed in more detail in the following 
sections . 
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Figure 2. Analytic Procedure 
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TEST OF NORMALITY 


The test of normality analyzes the probability distribution, 
of a measure. The observed values of each measure are dis- 
tributed over some range. The normal distribution is 
readily identifiable in Figure 3. The test of normality 
will detect measures whose values are distributed in a pat- 
tern significantly different from the normal. For example, 
it would reject a measure with values clustered at one end 
of the range (skewed) rather than distributed symmetrically 
across it. 

This is not a very powerful test. It will accept any ap- 
proximately symmetrical distribution even' if that distribu- 
tion is not truly normal. However, the test is important 
because approximate normality of the data is an assumption 
of step two, the factor analysis. 

Six measures from the set of 60 'were rejected by the test of 
normality using the 0.05 level of significance. These are 
measures of techniques for which insufficient examples of 
use were available. Consequently, most projects had scores 
of zero for these degr ee-of-use measures, a result that pro- 
duced dramatically skewed distributions. They are 

• HIPO Design Technique 

• Verification and Validation Team (two measures) 

• Requirements Language Tool 

• Configuration Management Tool 

• unit Development Folders 

These measures could, however, be used in some other types 
of analyses not considered here. 
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FACTOR ANALYSIS 


The 54 remaining measures were included in the factor anal- 
ysis. The goal of the factor analysis is to "discover" the 
underlying structure of the data. Factor analysis hypoth- 
esizes the existence of a set of statistically independent 
"factors" that are not directly measurable by the experi- 
menter. Measures (or variables) are the quantities that are 
observed in practice. However, the apparent correlations 
among measures can be interpreted to be due to their joint 
correlation with common factors (see Figure 4). That is, 
two or more measures correlated with the same factor will be 
correlated with each other. The desirable result of a 
factor analysis is the extraction of a smaller set of fac- 
tors whose relationships are known (they are independent) 
from the larger set of meas.ures whose relationships are more 
complex. 

Consider this example of the factor concept. The number of 
errors in a piece of software and its mean time to failure 
are measures related to reliability and are correlated with 
each other. However, neither measure by itself is a full 
description of reliability. Such things as the location of 
the error and the severity of the failure must also be con- 
sidered. Therefore, the reliability quality factor is not 
directly measurable although a number of measurable vari- 
ables are correlated with it. 

A successful factor analysis will explain such groups of 
related measures. Thus, each factor defined will correspond 
to a distinct basic quality being measured by the original 
set of variables. These qualities are the sources of varia- 
tion (or differentiation) among the projects studied. 
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NOTE : VARIABLES MAY BE CORRELATED. 
FACTORS ARE INDEPENDENT. 



Figure 4. Concept of Factor Analysis 









The principles of factor analysis are explained in detail in 
the text by Harman (Reference 4) . A number of software im- 
plementations of factor analysis are available. The spe- 
cific software used in this analysis was the principal 
components factor procedure of the Statistical Analysis Sys- 
tem (Reference 5) . 
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SUMMARY OF RESULTS 


Further analysis of the 54 process and product measures that 
passed the test of normality produced a factor model con- 
taining 5 factors that explained 77 percent of the variance 
of the original measures. The meaning of each factor is 
determined by examining the measures that are closely cor- 
related with it. These factors and the amount of variance 
accounted for by each are as follows: 

• Methodology intensity (31%) 

• project Size (25%) 

• Computer Usage (9%) 

• Quality Assurance (8%) 

• Change Rate (5%) 

The variance associated with a factor is a measure of the 
degree to which that factor differentiates among the pro- 
jects (or cases) studied. Thus, it is a measure of informa- 
tion content. A larger portion of the total variance could 
have been accounted for by using a larger number of fac- 
tors. The relationship of the number of factors to the var- 
iance explained by the factor model is illustrated in 
Table 2 of Appendix A. The interpretation of additional 
factors is difficult because none of the original measures 
are highly correlated with them. Therefore, they are not 
included in this preliminary definition of the factor model. 

The correlations of the original measures with the five fac- 
tors are listed in Table 3 of Appendix A. Only correlations 
greater than 0.526 (the 0,01 level of significance) are re- 
produced. The measure showing the highest correlation with 
a factor can be taken as the best estimator of that quality 
factor from among the original measures included in the 
analysis. These "best” estimators are indicated by as- 
terisks in the tables. 
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Remember that, although the factors are mutually inde- 
pendent, any given measure may be correlated with more than 
one factor and/or with other measures. The factor model 
does, however, identify the strongest relationships in the 
data. Some specific observations are made below about each 
of the factors defined by the analysis. 

Factor 1 - The first and most powerful factor (Table 3a in 
Appendix A) is highly correlated with degree-of -use process 
measures; thus, this factor may be interpreted to represent 
the degree to which formal methodology was applied during 
development. The most strongly correlated measure, method- 
ology reinforcement (the extent to which adherence to speci- 
fied methodologies was enforced by management) , supports 
this interpretation. The strong correlation of so many 
methodology, tool, and documentation measures with a common 
factor suggests that simple regression and hypothesis test- 
ing techniques are inappropriate for analyzing such effects 
because of their inability to isolate the action of a single 
technique from among the actions of other techniques. 

Factor 2 - The second factor (Table 3b in Appendix A) is 
clearly related to the size of the development effort and 
product. Its "best" estimator is person-hours. The corre- 
lation of, top-down coding with this factor illustrates the 
descriptive, rather than predictive, nature of factor anal- 
ysis. The proper conclusion based on this observation is 
that more top-down coding tends to be used in small projects 
than in large ones, not that top-down coding necessarily 
reduces the size of a development effort. 
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Factor 3 - The third factor (Table 3c in Appendix A) con- 
tains a number of measures related to the pattern of com- 
puter usage. This factor indicates that the manner and 
degree of computer usage reflect the use of certain develop- 
ment tools and techniques. The "best” estimator of this 
factor is top-down design. 

Factor 4 - The fourth factor (Table 3d in Appendix A) has 
only one measure, semiformal quality assurance, signifi- 
cantly correlated with it. Thus, its meaning is difficult 
to establish. However, a substantial amount of variance 
(8 percent) is associated with this factor. The preceding 
factor contained five variable_s but explained only slightly 
more variance (9 percent). Thus, this factor and measure 
deserve closer examination in future analyses. 

Factor 5 - The last factor (Table 3e in Appendix A) clearly 
describes the change rate. The interpretation of this fac- 
tor is important since, as a consequence of the mutual inde- 
pendence of factors, it is independent of the four factors 
previously defined. Hence, methodology intensity, project 
size, and computer usage do not appear to be related to each 
other or to code stability (reliability) , as measured by the 
change rate. 

Another feature of this model should be noted. Although 
productivity was most strongly correlated with factor 4, it 
was not s ignif icantly correlated with any factor. Produc- 
tivity may still be related to specific methodologies but 
not to the general factors just defined. Thus, the informa- 
tion provided by this procedure about productivity and re- 
liability is negative in this example because unrelated 
qualities and measures were identified rather than related 

ones. 
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CONCLUSION 


The results presented here are preliminary. Conclusions 
based on the factor model just developed may change as more 
data become available and as the procedure is refined. How- 
ever, the analysis has demonstrated its capacity to resolve 
some important questions about the data. The conclusions 
are as follows: the basic qualities being quantified by the 

original measures can be identified and enumerated; their 
relative importance or strength (in terms of percentage of 
variance accounted for) can be established; and a "best" 
estimator can be selected for each quality. 

Therefore, we can define a concise set of quality measures 
that meaningfully characterizes the process and product com- 
ponents of the software development model and that can serve 
as a framework for further research. These qualities and 
associated measures can be studied in greater detail with 
other techniques to determine their relationships to produc- 
tivity and reliability more exactly. Hence, these results 
are a first step toward defining, explaining, and predicting 
software reliability and productivity in the SEL environment. 
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APPENDIX A - SUMMARY OF FACTOR ANALYSIS 


This appendix consists of a series of three tables that sum- 
marize the factor analysis procedure described in the pre- 
ceding discussion. Table 1 describes the measures evaluated 
in this analysis. Table 2 identifies the variances asso- 
ciated with factors. Table 3 lists the significant correla- 
tions (at the 0.01 level of significance) of measures with 
factors . 
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Table la. Methodology Measures 


(DEGREE OF USE) 


ORGANIZATION 

— 

CHIEF PROGRAMMER 

DESIGN 

- - 

WALKTHROUGHS 

DESIGN 

— 

FORMAL REVIEWS 

DESIGN 

— 

FORMALISMS 

DESIGN 

— 

TREE CHARTS 

DESIGN 

— 

PROGRAM DESIGN LANGUAGE (PDL) 

DESIGN 

— 

HIERARCHICAL INPUT PROCESSING OUTPUT (HlPO) 

DESIGN 

— 

TOP-DOWN 

DESIGN 

— 

ITERATIVE ENHANCEMENT 

CODE 

-- 

STUBS 

CODE 

— 

TOP-DOWN 

CODE 

— 

STRUCTURED 

CODE 

— 

WALKTHROUGHS 

CODE 

— 

READ 

CODE 

— 

CONFIGURATION CONTROL 

TEST 

. - 

FORMALISM 

TEST 

— 

FOLLOWTHROUGH 

TEST 

— 

BATCH 

TEST 

— 

V£fV PRESENCE 

TEST 

— 

V&V USE 
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Table lb. Tools Measures 


(DEGREE OF USE) 

FORMAL TRAINING IN METHODOLOGY 
INFORMAL TRAINING 
METHODOLOGY REINFORCEMENT 
REQUIREMENTS LANGUAGE (MEDL-R) 
DESIGN LANGUAGE (PDL) 

PRECOMPILER (SFORT) 

SOFTWARE AIDS {e.g., EXREF, MAP, LIST) 

LIBRARIAN 

DATA GENERATORS 

TERMINALS (TSO) 

REMOTE JOB PROCESSING (RJP) 
CONFIGURATION ANALYSIS (CAT) 



Table Ic. Documentation Measures 


(DEGREE OF USE) 

SEL FORMS 

DESIGN DOCUMENT 

DESIGN DECISIONS 

SEMIFORMAL QUALITY ASSURANCE 

ACTIVITY NOTEBOOKS 

UNIT DEVELOPMENT FOLDERS 

TEST PLANS 

USER'S GUIDE/SYSTEM DESCRIPTION 
FORMAL TREATMENT OF USER'S GUIDE 
WEEKLY/MONTHLY PROGRESS REPORTS 



Table Id. Resource/Product Measures 


(COUNTS AND RATES) 

NUMBER OF COMPONENTS 
TOTAL MODULES 
NEW MODULES 
MODIFIED MODULES 

TOTAL LINES OF CODE (INCLUDES COMMENTS) 

NEW LINES OF CODE (INCLUDES COMMENTS) 

MODIFIED LINES OF CODE 

NUMBER OF COMPUTER RUNS 

NUMBER OF CHANGES 

PAGES OF DOCUMENTATION 

TOTAL MANHOURS 

TOTAL COMPUTER HOURS 

PERCENT OF NEW CODE 

CHANGES PER LINE OF CODE 

CHANGES PER LINE OF NEW CODE 

NEW LINES + 20% OF REVISED LINES 

LINES OF CODE PER MANHOUR 

COMPUTER HOURS PER LINE OF CODE 
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Table 2. Preliminary Eigenvalues and Variances Associated With Factors 


FACTOR 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 1 

EIGENVALUES 

16.492786 

13.305087 

4.744286 

4.063959 

2.855640 

2.438981 

1 . 738979 

1 .555128 

1 . 4692 1 1 

1.101 198 

0.931850 

PORTION 

0.305 

0.246 

0.088 

0.075 

0.053 

0.045 

0.032 

0.029 

0.027 

0.020 

0.017 

CUM PORTION 

O 305 

0.552 

0.640 

0.715 

0.768 

0.813 

0.845 

0.874 

0.901 

0.922 

0.939 

FACTOR 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 


EIGENVALUES 

0.685056 

0.621503 

0.495917 

0.427863 

0.397183 

0.25591 1 

0.209853 

0. 153974 

0.055635 

O . OOOOOO 


PORI ION 

0.013 

0.012 

. 0.009 

0.008 

0.007 

0.005 

0.004 

0.003 

0.001 

0.000 


CUM PORTION 

0.952 

0.963 

0.972 

0.980 

0.987 

0.992 

0.996 

0.999 

1 OOO 

1 OOO 



NOTE: Only five factors were retained in the analysis. 



Table 3a. Factor 1 


MEASURE CORRELATION 


CHIEF PROGRAMMER ORGANIZATION .62 

DESIGN WALKTHROUGHS .75 

FORMAL DESIGN REVIEWS .75 

DESIGN FORMALISMS .83 

DESIGN TREE CHARTS .65 

PROGRAM DESIGN LANGUAGE (METHODOLOGY) .63 

CODE STUBS .86 

CODE WALKTHROUGHS .69 

CODE READING .60 

CONFIGURATION CONTROL (METHODOLOGY) .62 

TEST FORMALISMS .74 

TEST FOLLOWTHROUGH .72 

FORMAL TRAINING IN METHODOLOGY .78 

INFORMAL TRAINING .61 

METHODOLOGY REINFORCEMENT .89* 

DESIGN LANGUAGE (TOOL) .64 

SOFTWARE (CODING) AIDS .68 

LIBRARIAN .85 

DATA GENERATORS .71 

REMOTE JOB ENTRY .54 

SEL FORMS .76 

DESIGN DOCUMENT .73 

DESIGN DECISION (DOCUMENTATION) .75 

ACTIVITY NOTEBOOKS .76 

USER'S GUIDE/SYSTEM DESCRIPTION .69 

WEEKLY/MONTHLY PROGRESS REPORTS .69 

NOTE: VARIANCE ACCOUNTED FOR: 31%. 
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Table 3b. Factor 2 


MEASURE 

CORRELATION 

NUMBER OF COMPONENTS 

.89 

TOTALS MODULES 

.89 

NEW MODULES 

.85 

MODIFIED MODULES 

.80 

TOTAL LINES 

.91 

NEW LINES 

.92 

MODIFIED LINES 

.77 

NUMBER OF RUNS 

.91 

NUMBER OF CHANGES 

.93 

PAGES OF DOCUMENTATION 

.94 

PERSON HOURS 

.96* 

COMPUTER HOURS 

.88 

DELIVERED LINES 

.93 

TOP-DOWN CODING 

-.56 


NOTE: VARIANCE ACCOUNTED FOR: 25%. 
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Table 3c. Factor 3 


MEASURE CORRELATION 

COMPUTER HOURS/LINES OF CODE .60 

TOP-DOWN DESIGN .88* 

BATCH TESTING .70 

REMOTE JOB ENTRY .69 

TEST PLANS - .57 


NOTE: VARIANCE ACCOUNTED FOR: 9%. 



Table 3d. Factor 4 

MEASURE 

SEMiFORMAL QUALITY ASSURANCE 

] 

(PRODUCTIVITY 

MOTE: VARIAINICE ACCOUINITED FOR: 8%. 


CORRELATION 

. 60 * 

-. 30 ) 
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Table 3e. Factor 5 

MEASURE CORRELATIOIM 

CHANGES/LINES OF CODE .73* 

CHANGES/LINES OF NEW CODE .64 

NOTE: VARIANCE ACCOUNTED FOR: 5%. 
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PANEL #3 


SOFTWARE MODELS 

B. Littlewood/A. Sofer, George Washington University 
H. Sayani/C. Svoboda, Advanced Systems Technology Corporation 



SOFTWARE MODELS: 


A BAYESIAN APPROACH TO PARAMETER ESTIMATION IN THE 
JELINSKI-MORANDA SOFTWARE RELIABILITY MODEL 

by 

Bev Littlewood, The City University, London, England 
Ariela Sofer, The George Washington University, Washington, D.C. 


Abstract 

Maximum likelihood estimation procedures for the Jelinski-Moranda 
software reliability model often give misleading answers. We show here 
that a reparameterization and a Bayesian analysis eliminate some of the 
problems incurred by MLE methods and often give better predictions on 
sets of real and simulated data. 

Practical difficulties in estimating the initial number of errors 
N and the failure rate of each error (}) by the method of maximum like- 
lihood are: 

/\ 

1. N , the MLE of N , is occasionally infinite (i.e., the routines 

/S 

for calculating N and (j) do not converge) . Littlewood and 
Verrall show that N is finite if and only if the regression 
line of the interevent times t^ vs. i has positive slope. 

2. A serious problem is that often N - n , the sample size, and 
sometimes N = n . Thus the MLE predicts that the program is 
perfect even when it is far from being so. Forman and Singpur- 
walla have shown that N and <f) can only be trusted near the 
end of debugging, i.e., when almost all failures have been 
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removed . 



3. Even when these problems are not encountered, the results 
obtained from the model are too optimistic; it predicts the 
reliability to be greater than it really is. 

In view of these deficiencies, we are led to consider a Bayesian 
approach to the estimation problem. It seems plausible that it is easier 
to correctly estimate the initial program failure rate X = N(() than the 

/N 

initial number of bugs N , since small errors in <p could lead to large 
errors in N . It is therefore plausible to reparameterize the model to 
(X,(j)) instead of (N,<})) . 

Using now the Bayesian approach, letting prior (X,(J>) = prior (X)* 
prior (c()) , where prior (X) and prior (cj)) are gamma distributed, and 
using 


Vi<'> ' '■(Vi ^ ^ ' 


'i V 


= / < t 1 x,4>) post(X,(f) I t^,. . . ,t^)dXd(j> 

we obtain an explicit estimate of the program's current reliability. 
Similarly, we can get in closed form the distributions of the number of 
bugs remaining in the program, the number of bugs that have to be removed 
in order to attain a given reliability, and the times between future 
consecutive failures (provided they are well defined, i.e., the program 
is not perfect). 

The quality of these estimations was examined for the special case 
when X and tj) have an (improper) uniform prior distribution over 
[0,0°) (i.e., a noninfo rmative prior distribution). The predictions were 

examined both for real and for simulated sets of data. In all cases where 


ML erroneously predicts the program to be perfect, the Bayesian method 


gives a positive probability that the program is not perfect. Moreover, 
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since the predicted reliability is given in closed form, problems of 
convergence of the computer program are not encountered. 

To examine the quality of prediction, we use a goodness of fit pro- 
cedure. Suppose that from the data tj^,...,t^ we predict the distribu- 
tion of T , the time to next failure. We then observe t ,, . 

n+1 n+1 

Define U = Pr (T . , < t , , ) . If the model is correct, then U are 
n n+1 n+1 n 

uniform variables on (0,5,) . We compare the sample c.d.f. of the u 's 

n 

with a line of unit slope which is the uniform c.d.f. 

When applying the goodness of fit procedure to real data sets, the 
Bayesian approach is almost always better than the MLE method. For the 
simulated data, the goodness of fit procedure on the Bayesian estimates 
give very good results; this, however, is not always true for the real 
data sets. 

There seems to be evidence that the J-M model is intrinsically opti- 
mistic in its estimate of software reliability. This could be a conse- 
quence of the assumption that all errors contribute equally to the failure 
rate. A new model by Littlewood relaxes this assumption with the result 
that earlier fixes tend to involve larger reductions in the failure rate 
than the later ones. It can be shown that this model is less optimistic 
than the J-M model and we hope to examine its performance on real and 
simulated data in future work. 
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JELINSKI-MORANDA model assumptions: 

1 . Successive inter-failure times Tj , T 2 , . . . . are independent. 

pdf(tjlXi)-Xj e-Vi 

2. Xj = (N - i + 1 ) 0 where 

N is “initial number of faults” 0 is “contribution to program failure rate from each fault” 


f.r 



time 


Note that 

1. All fixes have same effect. 

2. Same model by SHOOMAN and MUSA. Same assumptions 
for NHPP model by GOEL-OKUMOTO. 
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There seems to be 3 problems with J-M: 

^ A 

1 . N occassionally infinite (0 = 0) 

Nec. & Suff. conditions: “Regression line of t versus i has negative slope” 
(Littlewood, Verrall: 1981IEEETR) 

This can also occur with simulated data from J-M with finite N, 0 9^ 0, 

A X 

However X = N0 is finite, non-zero. 

2. Reliability predictions always(?) too optimistic 

A 

3. N usually too small, sometimes equal to sample size (i.e. program is “perfect”) 
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Table 7. 

Failure Intervals — System 3 System Test Phase 


i 

Ti 


1 

115, 

1 

2 

0, 

1 

3 

83, 

3 

4 

178, 

3 

5 

194, 

3 

6 

136, 

3 

7 

1077, 

3 

8 

15, 

3 

9 

15, 

3 

10 

92, 

3 

11 

50, 

3 

12 

71, 

3 

13 

606, 

6 

14 

1189, 

8 

15 

40, 

8 

16 

788, 

18 

17 

222, 

18 

18 

72, 

18 

19 

615, 

18 

20 

589, 

26 

21 

15, 

26 

22 

390, 

26 

23 

1863, 

27 

24 

1337, 

30 

25 

4508, 

36 

26 

834, 

38 

27 ^ 

3400,, 

_40 

28 

6, 

40 

29 

4561, 

42 

30 

3186, 

44 

31 

10571, 

47 

32 

563, 

47 

33 

2770, 

47 

34 

652, 

48 

35 

5593, 

50 

36 

11696, 

54 

37 

6724, 

54 

38 

2546, 

55 

39 

-10175, 

56 
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SYSTEM 3 
FAILURE 

A 

N 

ESTIMATED 

A 

$ 

ESTIMATED INITIAL MORANDA 

NUMBER 

FAILURES 

MTTF 

PHI 

2 

999999 

0.5750E+02 

0. 173913E-07 

3 

999999 

0.6600E+02 

O. 151515E~07 

4 

5 

0.5900E+02 

0.338983E'02 

5 

6 

0.6480E+02 

0. 257202E-02 

6 

8 

0.7275E+02 

0. 171821E-02 

7 

7 

0.7884E+02 

0. 181206E~02 

8 

8 

0. 8845E+02 

0. 14 1318E-02 

9 

12 

0. 1 196E+03 

0. 696972E-03 

10 

19 

0. 1396E+03 

0. 377017E~03 

1 1 

55 

0. 1609E+03 

0. 1 12990E~03 

12 

999999 

O. 1688E+03 

0. 592304E-08 

13 

22 

0. 1387E+03 

0. 327621E-03 

14 

15 

0. 1 125E + 03 

0.592367E-03 

15 

18 

0. 1306E + 03 

0.425447E'03 

16 

18 

0. 1306E^03 

0. 425294E~03 

17 

21 

0. 1476E+03 

0.322715E'03 

18 

25 

O. 1616E+03 

0. 247463E-03 

19 

25 

0. 1622E+03 

0.246615E~03 

20 

25 

0. 1612E+03 

0.248210E'03 

21 

31 

0. 1807E+03 

0. 178535E'03 

22 

33 

0. 1854E + 03 

0, 1634 13E~03 

23 

26 

0. 1609E + 03 

0.239046E-03 

24 

26 

0. 1606E+03 

0.239457E-03 

25 

25 

0. 1520E+03 

0. 263205E-03 

26 

26 

O. 1628E+03 

0.236199E-03 

27 

27 

0. 1764E+03 

0.210001E-03 

28 

28 

0. 1876E+03 

0. 190384E-03 

29 

29 

O.2023E+03 

0. 170456E-03 

30 

30 

0.2182E+03 

O. 152766E-03 

31 

31 

0.2427E+03 

0. 132935E-03 

32 

32 

O.2642E+03 

0. 1 18265E-03 

33 

33 

O.2853E+03 

0. 106202E-03 

34 

34 

0.304 1E+03 

0.967196E'04 

35 

35 

O.3248E+03 

0.87955$E'04 

36 

36 

0.3519E+03 

0.789439E-04 

37 

37 

O.3804E+03 

0.710397E-04 

38 

38 

0.4073E+03 

0. 64604 1E'04 
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How well does model perform? 


Simplest problem is estimation of current reliability: 

Given data tj , . . tj_j , what can we say about Tj? 

What is cdf Fj (t)? 

Obtain ML estimates of N, 0, based on tj, . . > t^^_j and use “Predictor distribution” 

Fj(t)= 1 

If prediction is “good” 

Uj = Fj (Tj ) is approx. 

U(0,1). Examine Q-Q plots of realizations 
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EXAMPLE 


Data: MUSA “System 1”, range of i:30-129 
Jelinski-Moranda: poor prediction, optimistic 
Littlewood-Verrall: good prediction, slight pessimism 
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Bayesian J-M 


Reparameterize to (X, 0) from (N, 0) where X = N(^ “initial failure rate”. 

Assume: 

prior (X, 0) = prior (X) • prior (0) where prior (X) and prior (0) are gamma distributed 
Then “predictor distribution” is 

Fi (t) - P(Ti < t) = P(Tj < t I tj , . . . Vi) = / P(Ti < t 1 X,0) post (X,0 1 tj. . Vi ) dXd0 


Reparameterization: Informal Justification 



i, FAILURE NUMBER 
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For the case of uniform (improper) priors we get: 

Fi+1 (t'V ti) = 


‘i aj^jk!(i-k)! / I I 

XI I 


(S (i - j + 1) (t + it.y 

i-1 aj^ i - 1 (i-k)! 

where c * = Y] ~j‘ i 

^ (S (i-j)t/+l (2 tj)‘-^+l 

'f=l J Fi J 


i-k+1 


and where a^ j is the coefficient of x‘ in fl (x + k) = ; x 

k=» ^ 

These coefficients are easily computed from the relation 
^01 ^ 1 ^ * ^)i ^ ^ "^i 


J-k 


DATA: MUSA "System 3", i=18. .37 

" J-M ML estimation of (N,0) 

J-M Bayes, uniform (improper) priors on (X, (^) 
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Conclusion! 


1. Bayes J-M seems always (?) better than MLE J-M, but sometimes only slightly. 


2. Results on real data are always optimistic. 


3. But on SIMULATED data from J-M model, Bayes is very good, ML poor 


real data do not follow J-M model? 
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Hypothesis: Assumption of equal (t>'s is wrong. In fact <^'s different. 

Larger ones tend to be eliminated earlier: 


f.r. 
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0. Introduction 

1. Composite Case Study 

2. Analysis of the Problem 

3. Generalization of a Solution to the Problem 

4. Conclusion 


Hasan H. Sayani, Ph.D, 
Cyril P. Svoboda, Ph.D. 


Advanced Systems Technology Corporation 
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Greenbelt Maryland 20770 
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ABSTRACT 


Developers of information systems are bombarded with publicity releases 
hawking a plethora of tools and techniques. Although vendors give the impression 
that their product will lead to developer to the "promised land", they rarely are 
able to deliver. The result is that information systems developers ride a roller- 
coaster: rising to a peak of expectation and hope, only to plummet down the 
track of reality, before beginning to climb up to yet another peak of hope. This 
paper will analyze this situation from the authors' perspective, formed by using 
various information system tools /techniques and by consulting with over ten 
Fortline 500 firms and six government agencies. 

A case study will be presented which draws together the issues raised in 
three distinct cases. Obviously, the names of the organizations will be changed 
as will any other information that might lead to identification. This case study 
will show a typical progression from the selection of an analysis methodology (SA) 
to the adoption of an automated tool for specification and documentation (PSL/PSA) 
and the difficulty of fitting these into an existing life cycle development methodology 
The problem presented in the case study is similar to the problem of resonance 
over a period of time , the morale of system developers reels through a journey 
over peaks of "hyped" expectations and down into valleys of depressing realizations. 
In addition, management is weighed down with the pressures of short-term goals 
and the burdens created by long ignored human factors, both of which entice ■ 
management to press for "any" product rather than the "right" product. The 
technology to which both developers and management often turn in desperation is 
marked by desperate development and by the shallow experience of the developers. 
Lastly, the mentality of those emplo 5 ring development tools and/or techniques is 

H. Sayani 
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very often provincial, relegating various items to a rigidly determined set of 
categories or hardware- driven. 

The general approach to a solution is taken from a procedure for problem- 
solving developed by Svoboda and Sayani (1980). In this procedure, the system 
developer is encouraged to take time first to examine the problem before attempting 
to solve it , defining its major dimensions and determining the evaluative criteria 
to be used in assessing any proposed solution. Then the problem-solver uses 
some visualization tactic suited to his/her cognitive style or suggested by an 
organization's methodology. These visualizations are then elaborated on by 
translating them into linguistic expressions, at various levels of formality or 
precision. What is expressed needs to be reflected, so that the composer can 
grasp the implications of what has been said from various points of view , with a 
differing focus or scope. Although what has been said seems, on reflection, to 
be what was intended, it needs next to be analyzed or evaluated against the 
earlier determined criteria, in light of any constraints, within the scope of 
resources available. Those specifications which do not "pass" the foregoing 
evaluation must be modified and this expression-reflection-evaluation-modification 
process must be repeated until the system has been completely specified and is 
ready for construction and implementation. Before the development team congratulates 
itself for a job "well-done", it should project which tool /technique ought next to be 
selected and employed and what has been learned from the whole process of system 
development that might give direction to the next effort. 

If an organization does not employ such an approach in systems development , 
it will eventually begin to experience the rollercoaster ride mentioned earlier. If 
one does employ such an approach, the organization will be in a better position 
from which to assess the intrinsic quality of its tools /techniques and their 
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contribution to the successful development of information systems. Such an 
approach woxild offer the basis for guiding an organization in the introduction, 
facilitation and institutionalization of new tools /techniques for the development 
of future information systems. 
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CREDENTIALS 


Corporate Objectives 

— R & D in IS development process 

— analysis^ design^ code generation and life cycle 
management automation tools 

-- engineering and human factors background 

-- application of tools on projects 


Corporate Experience 

— instruction and application of tools 

- 23 courses^ seminars & workshops on PSL/PSAy 

methodologies^ tools (ADL. ADS) 

— consultation with organizations using tools 

(over iO Fortune 500 & major Government Agencies) 
on all levels of organization 

- executive 

- management 

- operational 

— evaluation of usage of tools 
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PREVIEW OF PRESENTATION 


Composite Cose Study 

— examination of organization background in 

software development process 

- recognition of need for formal techniques 

— response to problem 

-- result of piece-wise intro of tools 

Analysis of Situation 
Generalization Approach 
Conclusion 
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COMPOSITE CASE STUDY 


Examination of Background in Software Development Process 


— third generation of hardware 

-- obsolete/poorly documented existing systems 
-- high turnover/additions to systems people 

— dissatisfied users viewing systems as: 

- inadequate and costly 

- in large backlog/overruns 

- unintegrated 

-- lure of effortless development via tools and 
techniques 

- "let's get on some bandwagon" 
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RESPONSE TO PROBLEM 


"Small is beautiful" 


"Have Money - Will Buy Tools" 

-- one for each phase of development life cycle 
-- acquire tools 
— train pilot group 


H. Sayani 
ASTEC 
lOof 18 




RESPONSE TO PROBLEM 


Apply the Solution 

-- result con range from 
- success to disaster 

Next Evolutionary Step 

— pass on work from one phase to another, or 

— have a second group use the same tool 

— both of which are usually doomed to disaster 

Backlash 

-- build in-house 

-- force fit a tool by outspoken advocate 

— regress 
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ANALYSIS OF SITUATION 


Problem of Introduction 

— reality rarely matches overall expectations 

— never possible in isolation 

- distortion between existing and new 
techniques for each tool 

— difficulty of integration across life cycle 

phases 
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ANALYSIS OF SITUATION 


Management "Baggage" 


— short term goals 
-- due-date versus quality 

-- ignoring human factors 

- career-path implications 

- E & T budget 

- management styles 

authoritative 

democratic 

laissez-faire 
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ANALYSIS OF SITUATION 


Technology Growing Pains 

— first generation of tools/techniques 

- shallow experience 

-- vendor myopia and user passivity 

-- disparately developed 

- no overall plan of action 

— changing ground rules 

- cost parameters (hardware/software ratios) 

- rapidly changing base technologies 

DBMS 

A-I 

Graphics 
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ANALYSIS OF SITUATION 


Field Immaturity 

— failure to recognize commonalities 
e.g.y different types of systems 

- engineering vs commercial 

-- financial and legal community's effect 

- capitalization 

- protection (e.g.> copyright/trade secrets) 

- inability to keep up with rate of change 

— Governmental approach 

- doesn't foster coordinated effort 
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GENERALIZATION APPROACH 


(Problem-Solving) 


Problem Recognition 

— postpone solution before understanding 

— dimensions of problem 

— developing criteria of judging solution 


Visualization 

-- cognitive style 

— methodology 

— merely a basis for further work 
” not universal 


Expression 

" graphics 
-- linguistic 

- levels of formality 


Reflection 

-- other than mere echo of expression 
— other focuS/ scope^ dimension 
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GENERALIZATION APPROACH 


(Problem-Solving) 


Analysis/Evaluation 

— comparing against criteria 

— evaluate against constraints 

— realization of resources available 


Modification/Iteration 

" sensitivity analysis 
— impact projection 


Solution 


— determination 

and 

— presentation 


of product 


Iteration 

— where should next tool fit? 

— what have we learned from experience? 
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CONCLUSION 


— User organization: "get your house in order" 


— Articulate needs of tools/techniques 


— Set quality standards 


— Evaluate existing tools/techniques 


— Walk through whole development cycle scenario 


— Introduce in a studied fashion 

- deliverables 

- career paths 

- feedback 

- support usage 

- training 


— Study the process as well as the problem 
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SOFTWARE METHODOLOGIES 

H. Mills/M. Dyer, IBM 

B. Jones, Hughes Aircraft Corporation 

R. Hamilton, Bell Labs 



sixth Annual Software Engineering Workshop 
Goddard Space Flight Center 
December 2, 1981 

Cleanroom Software Development 

M. Dyer and H. D. Mills 


The 'cleanroom' software development process is a new IBM technical 
and organizational approach to developing software with certifiable 
reliability. Key ideas behind the process are well structured soft- 
ware specifications, randomized testing methods and the, introduction 
of statistical controls; but the main point is to deny entry for de- 
fects during the development of software. This latter point suggests 
the use of the term 'cleanroom' in analogy to the defect prevention 
controls used in the manufacture of high technology hardware. 

The present state of the art in software development is to conceive 
and design a system in response to perceived requirements, then test 
the system with cases perceived to be typical to those requirements. 
The result is frequently a system which works well against inputs 
similar to those tested for, but one which is unreliable in unexpected 
circumstances. In fact, the evidence obtained by such testing is 
entirely anecdotal rather than statistical. 

In the 'cleanroom', we embed the entire software development process 
within a formal statistical design, in contrast to executing selected 
tests and appealing to the randomness of operational settings for 
drawing statistical inferences. Instead, we introduce random testing 
as a part of the statistical design itself so that when development 
and testing is completed, statistical inferences can be made about 
the future operation of the system. 
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We believe there are several major benefits to such a procedure. One 
benefit is derived from standard statistical procedures in which a 
formal statistical design permits objective statements about properties 
of the system. But it is believed that an even more important benefit 
will arise from effects on the developers through the discipline of the 
statistical design on their activities. In fact, we believe that develop- 
ing systems under stringent statistical controls will induce significant 
behaviour modifications on software developers. 

Presently, when developers conceive early tests to check the correct 
operation of a system, they are able to identify just those parts of 
the system that will have to function correctly to pass those tests. 
Therefore, they can develop systems in phases, and control the test- 
ing such that the system under development is protected from unwanted 
testing. As a consequence, system parts may be omitted or done per- 
functorily since the choice of tests is under the control of the de- 
velopers . 

We have in mind a different circumstance in testing under statistical 
control, namely, that from the outset tests are selected at random 
out of an expanding (top down) hierarchy of operational test cases. 
Therefore, the system designer must be prepared to deal with a growing, 
but always coherent, set of eventualities. It is believed that this 
circumstance, which may seem unfair or impossible at first glance, will 
dramatically change the way software development is done, by forcing a 
system approach top down rather than permitting tottom up pieces to be 
conceived and built under the protection of developer-selected testing. 
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CLEANROOM SOFTWARE DEVELOPMENT PROCESS 


DEFINITION 

O TECHNICAL AND ORGANIZATIONAL APPROACH TO DEVELOPING 

SOFTWARE PRODUCTS WITH CERTIFIABLE RELIABILITY 

LOGICAL EXTENSION OF 

O SOFTWARE RELIABILITY THEORY 

O MODERN SOFTWARE ENGINEERING PRACTICES 

o FUNCTIONAL ORGANIZATIONAL STRUCTURE 

GOALS 

O PRODUCT RELIABILITY 

INITIALLY ADDRESS PRODUCTS IN THE RANGE OF 10-2 5K SLOCS 

RELIABILITY TARGETS OF MTBF'S MEASURED IN MONTHS AND 
YEARS 

O STATISTICAL DESIGN 

EXPECTATION OF CORRECT SOFTWARE DESIGNS 

"BLACKBOX" TESTING OF SOFTWARE 
TESTING FOR THE OPERATIONAL ENVIRONMENT 

o PROCESS CONTROLS 

SOFTWARE PRODUCT ENGINEERING FUNCTION 
MANAGEMENT TO RELIABILITY COMMITMENTS 
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CLEANROOM SOFTWARE DEVELOPMENT PROCESS 


RELIABILITY MODEL 

O BASED ON SOFTWARE OPERATING FAILURES, NOT ERRORS IN THE CODE 

O DIFFERS FROM HARDWARE MODELS, LOGICAL NOT PHYSICAL FAILURES 

O REASONABLENESS- DEMONSTRATED USING PUBLISHED SOFTWARE 
FAILURE DATA 

STATISTICAL APPROACH 

O INPUT/OUTPUT SPECIFICATIONS 

o INPUT PROBABILITY DISTRIBUTIONS 

O STOCHASTIC PROCESS INTRODUCED THROUGH RANDOMLY SELECTED RUNS 

O MTBF STATISTICS DEVELOPED FROM CYCLE/FAILURE RATIO 


O CERTIFICATION BASED ON FAILURE FREE EXECUTION INTERVALS, 
NOT ERROR FREE CODE 
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CLEANROOM SOFTWARE DEVELOPMENT PROCESS 


CLEAN ROOM DEVELOPMENT METHOD 

O STARTS WITH STRUCTURED SPECIFICATION 

STATE MACHINE MODEL 

o SOFTWARE DESIGN ENGINEERING PROCESS 

MODERN DESIGN METHODS 
FIRST TIME CORRECT PROGRAMS 

O SOFTWARE PRODUCT ENGINEERING PROCESS 

IDENTIFICATION OF PRODUCT INPUTS AND PROBABILITY 
DISTRIBUTIONS 

SOFTWARE INTEGRATION INTO PRODUCT FORM 
COLLECTION/CORRELATION OF FAILURE STATISTICS (MTBF) 
CERTIFICATION TO CUSTOMER 

O SOFTWARE MANAGEMENT 

RELIABILITY COMMITMENTS 

PRODUCT VISIBILITY THROUGH MTBF MEASUREMENTS 
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CL.EANROOM SOFTWARE DEVELOPMENT PROCESS 


DESIGN FUNDAMENTALS 

O MODERN DESIGN METHODS 

STATE MACHINES AND FUNCTIONS 

STEPWISE REFINEMENT AND CORRECTNESS PROOFS 

DATA TYPING AND ABSTRACTION 

PROCESS DESIGN LANGUAGE (PDL) DOCUMENTATION 

O MODERN IMPLEMENTATION METHODS 

PROGRAM SUPPORT LIBRARIES 
HIGH-ORDER PROGRAMMING LANGUAGES 
STRUCTURED PROGRAMMING 
REVIEWS AND INSPECTIONS 

DESIGN INNOVATIONS 

O STATISTICAL DESIGN APPROACH 

DESIGN ALWAYS EXPOSED TO RANDOMIZED OPERATING 
INPUTS 

EMPHASIS ON TOP-DOWN IMPLEMENTATION STRATEGY 

O ELIMINATION OF SOFTWARE DEBUGGING 

FOCUS TESTING ON OPERATING ENVIRONMENT 
FOCUS DESIGN ON CORRECTNESS 
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CLEANROOM SOFTWARE DEVELOPMENT PROCESS 


PRODUCT ENGINEERING STRATEGY 

O CERTIFICATION BY INDEPENDENT GROUP 

TESTING FROM SOFTWARE SPECIFICATION WITH DESIGN 
DETAILS HIDDEN 

SEPARATION OF RESPONSIBILITIES AND INTERACTIONS 
O TEST DEVELOPMENT 

ANALYSIS OF INPUT PROBABILITY DISTRIBUTIONS 
STATISTICAL/DISCRETE INPUT VALUES 
INITIALIZATION AND OUTPUT VALUES 

CONCURRENCY CONSIDERATIONS FOR PERFORMANCE TESTS 

O TEST EXECUTION 

SELECTION OF RANDOM INPUT SAMPLES 
RECORDING OF FAILURE FREE EXECUTION MATERIALS 
GENERATION OF MTBF STATISTICS 

O FAILURE DIAGNOSTIC SUPPORT 

FAULT LOCALIZATION 
REGRESSION TESTING 
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CLEANROOM SOFTWARE DEVELOPMENT PROCESS 


SOFTWARE DESIGN ENGINEER 

O CREATES THE PRODUCT 

O RESPONSIBILITY 

IMPLEMENTATION OF AN APPROVED SPECIFICATION 
DELIVERY OF CORRECT SOFTWARE TO THE PRODUCT ENGINEER 

O OUTPUTS 

SOFTWARE PRODUCT DESIGN 
software PRODUCT CODE 
SOFTWARE PRODUCT DOCUMENTATION 

SOFTWARE PRODUCT ENGINEER 

O CERTIFIES THE PRODUCT 

O RESPONSIBILITY 

- VALIDATION OF TO PRODUCT AGAINST THE SPECIFICATION 

DELIVERY OF A CERTIFIED SOFTWARE TO THE CUSTOMER 

O OUTPUTS 

SOFTWARE PRODUCT TEST PLANS/PROCEDURES 
SOFTVJARE PRODUCT INTEGRATION. PLANS/PROCEDURES 
SOFTWARE PRODUCT LIBRARIES 
SOFTWARE PRODUCT TEST REPORTS 
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CLEANROOM SOFTWARE DEVELOPMENT PROCESS 


TOOL REQUIREMENTS 

O LIBRARY SYSTEM 

DESIGN DOCUMENTATION 
PRODUCT CODE 

CERTIFICATION TEST SAMPLES 

O STATISTICAL MODEL 

MTBF CALCULATIONS 
TREND ANALYSES 

O SOFTWARE UTILITIES 

TEST SAMPLE BUILD 
TEST EXECUTION CONTROL 
DATA COLLECTION/REDUCTION 
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SELECTING A SOFTWARE DEVELOPMENT METHODOLOGY 


Robert E. Jones 
Hughes Aircraft Company 
Fullerton, CA 


This paper describes the “Integrated Software Development Methodology (ISDM)” which is 
being accomplished by Hughes Aircraft Company, Software Engineering Division, in Fullerton, 
California and is sponsored by the Air Force Wright Aeronautical Laboratories, Flight Dynamics 
Laboratory at Wright Patterson AFB, Dayton, Ohio under Contract F33615-80-C-3614. 

. The ISDM project is currently in progress and its purpose is to study in detail state-of-the-art 
analytical techniques for the development and verification of digital flight control software and 
produce a practical designer-oriented development and verification methodology. 

SCOPE 

The scope of this project is limited to the study of existing tools and analytical techniques 
and the production of a practical ISDM guidebook. The methodology selected is adapted to flight 
control software, but is also applicable to most real time software developments. 

The problem of evaluating the complete system is called validation, while the problem of 
checking the software at each stage of the design process is called verification. This project is 
concerned with verification. 

The effectiveness of the analytic techniques chosen for the development and verification 
methodology will be assessed both technically and financially. Technical assessments analyze the 
error preventing and detecting capabilities of the chosen technique in all of the pertinent software 
development phases. Financial assessments describe the cost impact of using the techniques, 
specifically, the cost of implementing and applying the techniques as well as the realizable cost 
savings. Both the technical and financial assessment will be quantitative where possible. In the 
case of techniques which cannot be quantitatively assessed, quahtative judgeme'n^ will be ex- 
pressed about the effectiveness and cost of the techniques. The reasons why quantitative assess- 
ments are not possible will be documented. 

BACKGROUND 

The design of digital flight control systems has been the role of the control engineer rather 
than the computer or software specialist. Research into software design and verification has been 
the role of very specialized software experts. The results of this research have not always been 
practical in helping the flight control system designer with his tasks. Many tools and techniques 
are too complex to adapt to the flight control problem. Other tools are too expensive to main- 
tain and operate for the flight control problem. 
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SUMMARY OF OBJECTIVES AND RESULTS 


The objectives and results being discussed here reflect those individual objectives and accom- 
plishments to date. 

Metrics 

The development of metrics which can be applied to assess the design quality was one of our 
first objectives. The effort was to be directed toward predictive metrics with the intention of 
producing metrics which can be used by a flight control systems engineer to determine the quality 
of the design produced and the likelihood of a successful implementation. 

The metrics are being developed to aid in predicting such things as how many errors are 
likely, how long it will take to test, how long it will take to correct an error, etc. 

One of the results is that a set of concepts which provide the foundation for the ISDM metrics 
has been developed. The equations which will be used as the basis of procedures to calculate pre- 
dictors for the testability, reliability and flexibility have also been defined. 

Guidebook 

The overall objective is to create an integrated set of techniques and tools which are usable 
by a digital flight control systems engineer for development of a DECS. Primary emphasis is to be 
on those activities involved in generating the DECS software requirements specification, performing 
the software design, and verifying the software design through software integration. 

The guidebook represents the bulk of the output from this project and will be the most 
visible. Emphasis must be placed on generating a document that is clear, understandable, and 
usable while fulfilling its intended role of a guidebook. 

The results thus far have produced a draft guidebook that is ready to be applied during the 
experiment. The guidebook goes beyond the explanations of the tool and technique description 
and use. There are discussions regarding the development environment and major issues of DECS 
software development. These are included to provide a backdrop for the actual application of the 
tools and techniques. 

As a result of numerous reviews on various versions of the draft guidebook, there now exists, 
a solid foundation from which to build. This building will occur as a result of the experiment. As 
different techniques are applied and as data is collected and analyzed, the guidebook will be up- 
dated. The guidebook will be maintained in a dynamic fashion, being changed as dictated by the 
experiment results. 

Experiment 

Having selected candidate analytic techniques and having organized these techniques into a 
guidebook, there remains the problem of objectively and quantitatively assessing the value of these 
techniques in producing a reliable flight control software system. For this reason, an experiment 
will be conducted in which a small sample flight control system will be developed using the ISDM 
guidebook. 
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The experiment will begin with the specification and progress through all software develop- 
ment phases. For each phase, an experiment will be conducted in which the analytic techniques 
and tools described in the guidebook will be applied. The resources expended in the application 
will be monitored and errors detected will be monitored and summarized. 

In each of the development phases of the experiment, two classes of activity will take place. 
The first class of activities will be the actual application of the techniques in the ISDM guidebook 
to produce software. The second class of activities will be collection and analysis of the data 
pointing out the effectiveness of each technique, the impact of each technique on the overall 
schedule, the cost to prevent/detect errors, and the impact of errors on the total development 
effort. 

Results thus far include the development of the experiment plan. This document is a de- 
tailed description of the activities which will occur. The plan includes the following factors to be 
considered in evaluating the guidebook; 

1 . Usability by a flight control engineer, 

2. Cost to use, 

3. Quality of the result and software. 

The plan delineates the following data to be captured; 

1 . Errors, 

2. Cost, 

3. System documents, 

4. Subject comments. 

CONCLUSION 

The ISDM project has just started in the second phase, the experiment. Although it is too 
early to provide firm conclusions, we are already starting to see some indications of not only which 
tools/languages may be useful, but also identify distinct weaknesses. The experiment will help to 
proye.out these preliminary “feelings” and provide quantification, at least when applied to metho- 
dologies for specific applications. 
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HUGHES 


INTEGRATED SOFTWARE DEVELOPMENT 

METHODOLOGY 
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n 


ISDM GOAL 

EVALUATE ANALYTIC METHODS FOR VERIFICATION OF 
DIGITAL FLIGHT CONTROL SOFTWARE " 



HUGHES 


RESTATED 

DEVELOP GUIDEBOOK FOR AN ISDM 

(INTEGRATED SOFTWARE DEVELOPMENT METHODOLOGY) 
AND QUANTITATIVELY EVALUATE THE EFFECTS ON COST 
AND RELIABILITY OF USING ISDM FOR DEVELOPING DIGITAL 
FLIGHT CONTROL SOFTWARE 


151154 - 32 -( 12 - 9 - 81 ) 
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• DEVELOF> GUIDEBOOK FOR ISDM 


• CONDUCT DFCS EXPERIMENT 

.1 

• EVALUATE COST AND ERROR DETECTION 
EFFECTIVENESS 

i 

• DEVELOP DESIGN METRICS 

1 

• EVALUATE OVERALL ISDM COST AND RELIABILITY 
EFFECTIVENESS 


• RECOMMEND ISDM USAGE 

• RECOMMEND AREAS FOR FURTHER STUDY 
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ISDM GUIDEBOOKS CONTENTS PER 

PHASE 

1 . 

• DESCRIPTION OF PROCEDURES 

i • ' 

3 
5 

• DESCRIPTION OF SUPPORTING TOOLS AND TECHNIQUES 

' i 

\ • 

• TYPES OF EPIRORS DETECTED 

• REVIEW PROCEDURES 

I 

• INTERFACE WITH OTHER PHASES 

• INTERFACE BETWEEN TOOLS AND TECHNIQUES 

.1 

• resourcesIneeded for each tool and technique 

• GUIDELINES |fOR USING EACH TOOL AND TECHNIQUE 


r - 1 

I HUGHES I 
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EXPERIMENT PLAN SUMMARY 


HUGHES 


• IMPLEMENT SAMPLE SYSTEM USING TOOLS/ 
TECHNIQUES 

• RECORD DAILY ACTIVITIES FOR COST ANALYSIS 

• RECORD GUIDEBOOK COMMENTS FOR FINAL DRAFT 

• RECORD ERRORS DETECTED AT EACH PHASE 

• ANALYZE TOOL/TECHNIQUE EFFECTIVENESS 

181140-6 ( 11 - 26 - 81 ) 
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EXPERIMENTAL APPROACH 


HUGHES 



ISM40J4(n-304t) 
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ASSESSMENT REVIEW 


• GUIDEBOOK USABILITY AND ACCURACY 

• COST OF TOOL/TECHNIQUE APPLICATION 

• QUALITY OF SOFTWARE/DOCUMENTATION 

RELIABILITY 

SAFETY 

TESTABILITY 

MAINTAINABILITY 

FLEXIBILITY 
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TOOL ANALYSIS 


^ ^ 

; HUGHES ; 




• TRAINING COSTS 

• GUIDEBOOK EFFECTIVENESS AND ACCURACY 

• ERROR DETECTION CAPABILITY 

'. * 'I * ' ■ - ' ■ 

• DOCUMENTATION SUPPORT 
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PLANNED ACTIVITIES 


HUGHES 


• COMPLETE EXPERIMENT 

• EVALUATE RESULTS 

• RECOMMEND IMPROVEMENTS 

• TAILOR GUIDEBOOK 
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Development Techniques for Generic Software 

Richard L. Heunilton 

Bell Laboratories 
Holmdel, New Jersey 07733 

1 . INTRODUCTION 

In developing the first version of a generic implementation of X.25, 
Levels 2 and 3, we examined three development techniques: table-driven 
finite state machine implementation, an integrated testing environment, 
and top-down design. While not designed as an experiment, we monitored 
the project closely and compared the product with other implementations 
of X.25 at Bell Laboratories to evaluate potential benefits and 
penalties. 

2. TECHNIQUES 

2.1 Finite State Machine 

A finite state machine (FSM) is a powerful tool for both specifying and 
implementing protocols. This technique was used in the X.25 
specification and has been discussed in the literature[ 1 ,2,3,4] . A 
table-driven implementation of the FSM was chosen to facilitate changes 
and simplify coding. We were interested in what effect this technique 
would have on program size, speed of execution, coding time, and 
debugging time. 

2.2 Testing Environment 

Contrary to common practice, we made a testing environment before 
coding. The complexities of a communications protocol, especially 
X.25, require careful attention to the problems of verifying that an 
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implementation of that protocol does in fact perform correctly. In 
addition, we felt that the process of verification should start as 
early as possible in the development process. The testing environment, 
which runs under the UNIX” operating system, let us test the FSM and 
its tables very early in the coding process. We were able to integrate 
new modules easily and test them thoroughly using this tool. 

2.3 Top Down Design 

In designing and implementing a solution, we followed a top-down 
approach. This made it possible to have a "running" version at all 
times, with unwritten modules replaced by dummy routines. This was not 
rigorously followed in coding because it was often more sensible to 
code all of the routines that performed one function even if that meant 
coding some low-level functions early. Doing this still let us always 
have a running version, but simplified testing. 


3 . MEASUREMENTS 

Our main method for evaluating these techniques was comparison with 
existing implementations of X.25 at Bell Laboratories. We measured the 
size and execution speed of both our implementation and the existing 
ones and ran some simple complexity metrics. 


UNIX is a Trademark of Bell Laboratories 
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We used the testing environment to help modify and transport existing 
implementations of both Level 2 and Level 3 to a new environment, which 
gave us the opportunity to compare our versions with the existing ones 
in terms of the ease of making modifications. We kept a log of program 
bugs found and the effort it took to fix them, for all of the 
implementations, and tried to characterize the types of problems found. 

4. CONCLUSION 

A combination of a table-driven finite state machine realization, a 
comprehensive testing environment, and a top-down approach was used to 
produce an implementation of X.25, Levels 2 and 3. In comparison with 
other, ad hoc, X.25 implementations, we found that our solution ran as 
much as 20% faster, but was about 35 to 40 percent bigger. We were 
able to explain all but 11% of that difference in terms of added 
function or added flexibility. A McCabe complexity metric showed 
little difference between the implementations. 

Comparison of time spent debugging showed that our approach was 
superior to the ad hoc methods, both in terms of number of errors 
detected and time taken to correct those errors. Even so, the testing 
environment was shown to be a significant aid in debugging the other 
implementations when compared to other testing techniques. Although 
not intended as a controlled experiment, the data collected during 
development support using these techniques in similar circumstances. 
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DEVELOPMENT TECHNIQUES 
FOR GEMERIC SOFTWARE 



OBJECTIVES 

Portable 

Maintainable 

Flexible 

Modifiable 


TOOLS 

C language, minimal 
set of prinnitive functions 

Testing/ 

development environment 

Table-driven finite state 
Layered approach 



DEVELOPMENT ENVIRONMENT 


UNIX™Operating System 
• Make 
• AWK 

• sees 


• Shell 
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LEVEL 2 -- TESTING ENVIRONMENT 


Automated 

Regrassion 

Testing 

Interaotiva 

Debugging 



INITE STATE MACHINE 


• Table-driven 

• Hierarchical 


• Parallel 
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X.25 IMRtEMENTATION 


MESSAGE 



LINK 
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LEVEL 2 

LINES OF CODE 

% DIFFERENCE 

• Existing 

1039 

J 


• Generic 

1 846 

I 

+78% 

LEVEL 2 

LINES OF CODE 

% DIFFERENCE 

• Existing 

1590 

I 


• Generic 

2252 

+42% 
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LEVEL 2 

TEXT 

DATA 

= 

TOTAL 

DIFFERENCE 

* Existing 

5688 

56 

— 

5744 


♦ Generic 

6766 

1236 

= 

8002 

+39,^ 

LEVELS 

TEXT 

DATA 


TOTAL 


e Existing 

6818 

268 

= 

7086 


* Generic 

0558 

926 


9404 

+ 34%' 


Note: All programs compiled under Ihe 0OB6 cross-compilar 
with the optimize option, without primitives, and 
without any debugging aids included 



Addad function 


• Channel No. 

1 

200 

• Timer routines 

272 

; * Disconnect 

186 

Added flexibility 


♦ Action overhead 

248 

♦ Channel select 

52 

’ ♦ Multi-table FSM 

200 

♦ Table clarity 

192 

♦ Optional prims 

100 

TOTAL 

1450 

Actual difference 

2258 

Bytes unaccounted for 

808 



Size 

Speed 


MEASUREMENTS 


- 35 to 40% larger 

- 0 to 20% faster 


Complexity - Equivalent 
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